Pub Date : 2025-05-06eCollection Date: 2025-01-01DOI: 10.1080/02664763.2025.2496719
Thomas Farrar, Renette Blignaut, Retha Luus, Sarel Steel
This article reviews methods of parameter estimation and inference in the linear regression model under heteroskedasticity. Several approaches to feasible weighted least squares estimation of the parameter vector are reviewed, along with various heteroskedasticity-consistent covariance matrix estimators, which are usually designed with inference as the end goal. A Monte Carlo experiment is designed to evaluate the ability of the reviewed methods to estimate three quantities: the variances of the random errors, the parameter vector, and the standard error of the ordinary least squares estimator thereof. Results of the experiment show that the homoskedastic variance estimator performs well at estimating error variances even in the heteroskedastic data-generating processes studied. Feasible weighted least squares approaches perform best for estimation of the parameter vector, whereas heteroskedasticity-consistent covariance matrix estimators perform best for estimation of the standard error thereof. This motivates a search for a method that would perform well in all three respects.
{"title":"A review and comparison of methods of parameter estimation and inference for heteroskedastic linear regression models.","authors":"Thomas Farrar, Renette Blignaut, Retha Luus, Sarel Steel","doi":"10.1080/02664763.2025.2496719","DOIUrl":"https://doi.org/10.1080/02664763.2025.2496719","url":null,"abstract":"<p><p>This article reviews methods of parameter estimation and inference in the linear regression model under heteroskedasticity. Several approaches to feasible weighted least squares estimation of the parameter vector are reviewed, along with various heteroskedasticity-consistent covariance matrix estimators, which are usually designed with inference as the end goal. A Monte Carlo experiment is designed to evaluate the ability of the reviewed methods to estimate three quantities: the variances of the random errors, the parameter vector, and the standard error of the ordinary least squares estimator thereof. Results of the experiment show that the homoskedastic variance estimator performs well at estimating error variances even in the heteroskedastic data-generating processes studied. Feasible weighted least squares approaches perform best for estimation of the parameter vector, whereas heteroskedasticity-consistent covariance matrix estimators perform best for estimation of the standard error thereof. This motivates a search for a method that would perform well in all three respects.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 16","pages":"3091-3120"},"PeriodicalIF":1.1,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12683765/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145714470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-30DOI: 10.1080/02664763.2025.2496724
Zhongren Chen, Lu Tian, Richard A Olshen
This paper is motivated by the need to quantify human immune responses to environmental challenges. Specifically, the genome of the selected cell population from a blood sample is amplified by the PCR process, producing a large number of reads. Each read corresponds to a particular rearrangement of so-called V(D)J sequences. The observed data consist of a set of integers, representing numbers of reads corresponding to different V(D)J sequences. The underlying relative frequencies of distinct V(D)J sequences can be summarized by a probability vector, with the cardinality being the number of distinct V(D)J rearrangements. The statistical question is to make inferences on a summary parameter of this probability vector based on a multinomial-type observation of a large dimension. Popular summaries of the diversity include clonality and entropy. A point estimator of the clonality based on multiple replicates from the same blood sample has been proposed previously. Therefore, the remaining challenge is to construct confidence intervals of the parameters to reflect their uncertainty. In this paper, we propose to couple the Empirical Bayes method with a resampling-based calibration procedure to construct a robust confidence interval for different population diversity parameters. The method is illustrated via extensive numerical studies and real data examples.
{"title":"An empirical Bayes approach for constructing confidence intervals for clonality and entropy.","authors":"Zhongren Chen, Lu Tian, Richard A Olshen","doi":"10.1080/02664763.2025.2496724","DOIUrl":"10.1080/02664763.2025.2496724","url":null,"abstract":"<p><p>This paper is motivated by the need to quantify human immune responses to environmental challenges. Specifically, the genome of the selected cell population from a blood sample is amplified by the PCR process, producing a large number of reads. Each read corresponds to a particular rearrangement of so-called V(D)J sequences. The observed data consist of a set of integers, representing numbers of reads corresponding to different V(D)J sequences. The underlying relative frequencies of distinct V(D)J sequences can be summarized by a probability vector, with the cardinality being the number of distinct V(D)J rearrangements. The statistical question is to make inferences on a summary parameter of this probability vector based on a multinomial-type observation of a large dimension. Popular summaries of the diversity include clonality and entropy. A point estimator of the clonality based on multiple replicates from the same blood sample has been proposed previously. Therefore, the remaining challenge is to construct confidence intervals of the parameters to reflect their uncertainty. In this paper, we propose to couple the Empirical Bayes method with a resampling-based calibration procedure to construct a robust confidence interval for different population diversity parameters. The method is illustrated via extensive numerical studies and real data examples.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12435542/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145075083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-29eCollection Date: 2025-01-01DOI: 10.1080/02664763.2025.2495717
Chunjie Wang, Jing Li, Xiaohui Yuan
The availability of massive data stored across multiple locations is increasing in many fields. The data at each site often exhibits large-scale features. Current research primarily focuses on such datasets that consist of uncensored observations. As a popular model in survival analysis, the AFT model provides an intuitive explanation of survival times, making the model results easier to understand in practical applications. In this paper, we develop a distributed subsampling procedure specifically designed for accelerated failure time (AFT) model. The consistency and asymptotic normality of the resulting estimator are proved. A two-step algorithm is provided to address practical implementation issues and to determine both the optimal subsampling probabilities and allocation sizes. We conduct numerical simulation studies to evaluate the performance of our method and apply it to a lymphoma dataset.
{"title":"Optimal distributed subsampling for accelerated failure time models with massive censored data.","authors":"Chunjie Wang, Jing Li, Xiaohui Yuan","doi":"10.1080/02664763.2025.2495717","DOIUrl":"https://doi.org/10.1080/02664763.2025.2495717","url":null,"abstract":"<p><p>The availability of massive data stored across multiple locations is increasing in many fields. The data at each site often exhibits large-scale features. Current research primarily focuses on such datasets that consist of uncensored observations. As a popular model in survival analysis, the AFT model provides an intuitive explanation of survival times, making the model results easier to understand in practical applications. In this paper, we develop a distributed subsampling procedure specifically designed for accelerated failure time (AFT) model. The consistency and asymptotic normality of the resulting estimator are proved. A two-step algorithm is provided to address practical implementation issues and to determine both the optimal subsampling probabilities and allocation sizes. We conduct numerical simulation studies to evaluate the performance of our method and apply it to a lymphoma dataset.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 16","pages":"3036-3052"},"PeriodicalIF":1.1,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12683775/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145714432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-25eCollection Date: 2025-01-01DOI: 10.1080/02664763.2025.2494132
Changyong Feng, Hongyue Wang, Honghong Liu
The relative risk (r), risk difference (d), and odds ratio are three commonly used indices in epidemiology to quantify the association between the risk of a disease and the exposure to a risk factor. However, it has been reported in [C. Feng, B. Wang, and H. Wang, The relations among three popular indices of risks, Stat. Med. 38(2019), pp. 4772-4787.] that there is no monotonic relationship between any two of these three indices. In fact, our research shows that even if two of these indices change in the same direction, the third one may change in the opposite direction. This indicates that there is an internal inconsistency among these three indices in measuring the association between the risk of the disease and the risk factor. Therefore, the sizes of these indices cannot be interpreted as the strength of the association. We have derived some limiting behaviors of these indices and discussed the approximation of risk ratio by odds ratio. Our results clarify some misconceptions about these three widely used indices. In summary, our research highlights the limitations of using only one of these indices to measure the association between the risk of a disease and the exposure to a risk factor. To fully understand the nature of the association, it is important to consider all three indices and their relationships.
{"title":"Inconsistency of three indices in measuring the association between the risk factor and the risk of a disease.","authors":"Changyong Feng, Hongyue Wang, Honghong Liu","doi":"10.1080/02664763.2025.2494132","DOIUrl":"https://doi.org/10.1080/02664763.2025.2494132","url":null,"abstract":"<p><p>The relative risk (<i>r</i>), risk difference (<i>d</i>), and odds ratio <math><mo>(</mo> <mi>θ</mi> <mo>)</mo></math> are three commonly used indices in epidemiology to quantify the association between the risk of a disease and the exposure to a risk factor. However, it has been reported in [C. Feng, B. Wang, and H. Wang, The relations among three popular indices of risks, Stat. Med. 38(2019), pp. 4772-4787.] that there is no monotonic relationship between any two of these three indices. In fact, our research shows that even if two of these indices change in the same direction, the third one may change in the opposite direction. This indicates that there is an internal inconsistency among these three indices in measuring the association between the risk of the disease and the risk factor. Therefore, the sizes of these indices cannot be interpreted as the strength of the association. We have derived some limiting behaviors of these indices and discussed the approximation of risk ratio by odds ratio. Our results clarify some misconceptions about these three widely used indices. In summary, our research highlights the limitations of using only one of these indices to measure the association between the risk of a disease and the exposure to a risk factor. To fully understand the nature of the association, it is important to consider all three indices and their relationships.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 16","pages":"3020-3035"},"PeriodicalIF":1.1,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12683764/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145714483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-23eCollection Date: 2025-01-01DOI: 10.1080/02664763.2025.2495718
Paulo Muraro Ferreira, Mariana Kleina
Given the increasing volume of data available, much of which lacks established categories, the development of algorithms capable of finding patterns in raw, unclassified data is becoming increasingly important. One type of clustering algorithm is the MulticlusterKDE, which is based on the search for centroids by maximizing the kernel density estimation function, which assumes local maxima at points of highest data density. The aim of this work is to propose a clustering algorithm based on improvements to the MulticlusterKDE algorithm, named IMCKDE. These improvements occur both in terms of response quality and computational time. Furthermore, it was observed that the MulticlusterKDE algorithm has prohibitively long computation times for large datasets, highlighting the relevance of IMCKDE.
{"title":"IMCKDE algorithm: an improvement in a clustering technique based on kernel density estimation.","authors":"Paulo Muraro Ferreira, Mariana Kleina","doi":"10.1080/02664763.2025.2495718","DOIUrl":"https://doi.org/10.1080/02664763.2025.2495718","url":null,"abstract":"<p><p>Given the increasing volume of data available, much of which lacks established categories, the development of algorithms capable of finding patterns in raw, unclassified data is becoming increasingly important. One type of clustering algorithm is the MulticlusterKDE, which is based on the search for centroids by maximizing the kernel density estimation function, which assumes local maxima at points of highest data density. The aim of this work is to propose a clustering algorithm based on improvements to the MulticlusterKDE algorithm, named IMCKDE. These improvements occur both in terms of response quality and computational time. Furthermore, it was observed that the MulticlusterKDE algorithm has prohibitively long computation times for large datasets, highlighting the relevance of IMCKDE.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 16","pages":"3053-3072"},"PeriodicalIF":1.1,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12683736/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145714415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-22eCollection Date: 2025-01-01DOI: 10.1080/02664763.2025.2492277
C Satheesh Kumar, Prince Sathyan
Here we consider a weighted version of the negative binomial distribution and illustrate its usefulness through fitting Covid-19 datasets. We obtain several important properties of the distribution such as probability generating function, cumulative distribution function, survival and hazard rate functions, as well as expressions for factorial and raw moments, and recurrence relations for probabilities. Further we discuss the estimation of the parameters of the model and constructed certain test procedures for examining the significance of the parameter. A simulation study is carried out for assessing the performance of the estimators of the parameters of the model obtained through the method of maximum likelihood.
{"title":"Weighted negative binomial distribution: properties and applications.","authors":"C Satheesh Kumar, Prince Sathyan","doi":"10.1080/02664763.2025.2492277","DOIUrl":"https://doi.org/10.1080/02664763.2025.2492277","url":null,"abstract":"<p><p>Here we consider a weighted version of the negative binomial distribution and illustrate its usefulness through fitting Covid-19 datasets. We obtain several important properties of the distribution such as probability generating function, cumulative distribution function, survival and hazard rate functions, as well as expressions for factorial and raw moments, and recurrence relations for probabilities. Further we discuss the estimation of the parameters of the model and constructed certain test procedures for examining the significance of the parameter. A simulation study is carried out for assessing the performance of the estimators of the parameters of the model obtained through the method of maximum likelihood.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 16","pages":"3003-3019"},"PeriodicalIF":1.1,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12683738/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145714460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-16eCollection Date: 2025-01-01DOI: 10.1080/02664763.2025.2492257
Foued Saâdaoui, Hana Rabbouch
In this paper, we introduce a novel and advanced multiscale approach to Granger causality testing, achieved by integrating Variational Mode Decomposition (VMD) with traditional statistical causality methods. Our approach decomposes complex time series data into intrinsic mode functions (IMFs), each representing a distinct frequency scale, thus enabling a more precise and granular analysis of causal relationships across multiple scales. By applying Granger causality tests to the stationary IMFs, we uncover causal patterns that are often concealed in aggregated data, providing a more comprehensive understanding of the underlying system dynamics. This methodology is implemented in a Python-based software package, featuring an intuitive, user-friendly interface that enhances accessibility for both researchers and practitioners. The integration of VMD with Granger causality significantly enhances the flexibility and robustness of causal analysis, making it particularly effective in fields such as finance, engineering, and medicine, where data complexity is a significant challenge. Extensive empirical studies, including analyzes of cryptocurrency data, biomedical signals, and simulation experiments, validate the effectiveness of our approach. Our method demonstrates a superior ability to reveal hidden causal interactions, offering greater accuracy and precision than leading existing techniques.
{"title":"Multiresolution granger causality testing with variational mode decomposition: a python software.","authors":"Foued Saâdaoui, Hana Rabbouch","doi":"10.1080/02664763.2025.2492257","DOIUrl":"https://doi.org/10.1080/02664763.2025.2492257","url":null,"abstract":"<p><p>In this paper, we introduce a novel and advanced multiscale approach to Granger causality testing, achieved by integrating Variational Mode Decomposition (VMD) with traditional statistical causality methods. Our approach decomposes complex time series data into intrinsic mode functions (IMFs), each representing a distinct frequency scale, thus enabling a more precise and granular analysis of causal relationships across multiple scales. By applying Granger causality tests to the stationary IMFs, we uncover causal patterns that are often concealed in aggregated data, providing a more comprehensive understanding of the underlying system dynamics. This methodology is implemented in a Python-based software package, featuring an intuitive, user-friendly interface that enhances accessibility for both researchers and practitioners. The integration of VMD with Granger causality significantly enhances the flexibility and robustness of causal analysis, making it particularly effective in fields such as finance, engineering, and medicine, where data complexity is a significant challenge. Extensive empirical studies, including analyzes of cryptocurrency data, biomedical signals, and simulation experiments, validate the effectiveness of our approach. Our method demonstrates a superior ability to reveal hidden causal interactions, offering greater accuracy and precision than leading existing techniques.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 16","pages":"3151-3172"},"PeriodicalIF":1.1,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12683760/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145714427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-15eCollection Date: 2025-01-01DOI: 10.1080/02664763.2025.2490975
Shiqi Liu, Weiwei Zhuang, Jinfeng Xu, Steven Xu, Min Yuan
The relationship between covariates and outcomes can change over time, regardless of whether these covariates are time-varying or static. For instance, the influence of circulating biomarkers like white blood cell counts on the efficacy of standard chemotherapy in cancer patients may shift throughout the treatment duration. Traditional models with constant coefficients may fail to capture these dynamic interactions. Additionally, when multiple covariates are present, their interactions within and across time periods can become complex. To address these issues, we introduce a Lasso-Network constrained time-varying linear mixed-effects model (TVLMM) accompanied by an efficient two-stage parameter estimation algorithm that tracks the evolution of fixed-effect coefficients over time. We validate our approach through extensive simulations that highlight its effectiveness and computational efficiency in high-dimensional settings. Our method is further applied to real data from a randomized clinical trial of patients with metastatic colorectal cancer (mCRC), treated with standard chemotherapy with or without panitumumab. This case study demonstrates how our approach adeptly captures the time-varying impacts of critical circulating biomarkers on treatment outcomes, specifically tumor size reduction.
{"title":"Estimating longitudinal biomarker effects using a Lasso-network constrained time-Varying mixed effects model.","authors":"Shiqi Liu, Weiwei Zhuang, Jinfeng Xu, Steven Xu, Min Yuan","doi":"10.1080/02664763.2025.2490975","DOIUrl":"https://doi.org/10.1080/02664763.2025.2490975","url":null,"abstract":"<p><p>The relationship between covariates and outcomes can change over time, regardless of whether these covariates are time-varying or static. For instance, the influence of circulating biomarkers like white blood cell counts on the efficacy of standard chemotherapy in cancer patients may shift throughout the treatment duration. Traditional models with constant coefficients may fail to capture these dynamic interactions. Additionally, when multiple covariates are present, their interactions within and across time periods can become complex. To address these issues, we introduce a Lasso-Network constrained time-varying linear mixed-effects model (TVLMM) accompanied by an efficient two-stage parameter estimation algorithm that tracks the evolution of fixed-effect coefficients over time. We validate our approach through extensive simulations that highlight its effectiveness and computational efficiency in high-dimensional settings. Our method is further applied to real data from a randomized clinical trial of patients with metastatic colorectal cancer (mCRC), treated with standard chemotherapy with or without panitumumab. This case study demonstrates how our approach adeptly captures the time-varying impacts of critical circulating biomarkers on treatment outcomes, specifically tumor size reduction.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 16","pages":"2985-3002"},"PeriodicalIF":1.1,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12683766/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145714473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-11eCollection Date: 2025-01-01DOI: 10.1080/02664763.2025.2490105
Alicja Jokiel-Rokita, Agnieszka Siedlaczek
This paper concerns the estimation of quantile versions of the Lorenz curve and the Gini index in the case of the generalized Pareto distribution. These curves and indices, unlike the Lorenz curve and the Gini index, are also defined for distributions whose expected value is not finite. The quantile versions of the Lorenz curve and the Gini index of the generalized Pareto distribution depend only on the shape parameter. Accuracy of the shape parameter estimators, recommended in the literature and those whose accuracy has not been studied so far, is compared in simulations. The accuracy of the plug-in estimators of the quantile versions of the Lorenz curve and the Gini index is also studied. Based on the simulations performed, if the sample size is not too large, we recommend using Zhang's estimator of the shape parameter in the estimation of quantile versions of the Lorenz curve and the Gini index. In case the shape parameter is small we also recommend the IPO estimator. The applications of the described methods in the real data analysis are also presented.
{"title":"Estimation of quantile versions of the Lorenz curve and the Gini index for the generalized Pareto distribution.","authors":"Alicja Jokiel-Rokita, Agnieszka Siedlaczek","doi":"10.1080/02664763.2025.2490105","DOIUrl":"https://doi.org/10.1080/02664763.2025.2490105","url":null,"abstract":"<p><p>This paper concerns the estimation of quantile versions of the Lorenz curve and the Gini index in the case of the generalized Pareto distribution. These curves and indices, unlike the Lorenz curve and the Gini index, are also defined for distributions whose expected value is not finite. The quantile versions of the Lorenz curve and the Gini index of the generalized Pareto distribution depend only on the shape parameter. Accuracy of the shape parameter estimators, recommended in the literature and those whose accuracy has not been studied so far, is compared in simulations. The accuracy of the plug-in estimators of the quantile versions of the Lorenz curve and the Gini index is also studied. Based on the simulations performed, if the sample size is not too large, we recommend using Zhang's estimator of the shape parameter in the estimation of quantile versions of the Lorenz curve and the Gini index. In case the shape parameter is small we also recommend the IPO estimator. The applications of the described methods in the real data analysis are also presented.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 15","pages":"2941-2957"},"PeriodicalIF":1.1,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12671419/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145668609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-11eCollection Date: 2025-01-01DOI: 10.1080/02664763.2025.2487912
Zongliang Hu, Qianyu Zhou, Guanfu Liu
Multivariate meta-analysis is an efficient tool to analyze multivariate outcomes from independent studies, with the advantage of accounting for correlations between these outcomes. However, existing methods are sensitive to outliers in the data. In this paper, we propose new robust estimation methods for multivariate meta-analysis. In practice, within-study correlations are frequently not reported in studies, conventional robust multivariate methods using modified estimation equations may not be applicable. To address this challenge, we utilize robust functions to create new log-likelihood functions, by only using the diagonal components of the full covariance matrices. This approach bypasses the need for within-study correlations and also avoids the singularity problem of covariance matrices in the computation. Furthermore, the asymptotic distributions can automatically account for the missing correlations between multiple outcomes, enabling valid confidence intervals on functions of parameter estimates. Simulation studies and two real-data analyses are also carried out to demonstrate the advantages of our new robust estimation methods. Our primary focus is on bivariate meta-analysis, although the approaches can be applied more generally.
{"title":"Multivariate meta-analysis with a robustified diagonal likelihood function.","authors":"Zongliang Hu, Qianyu Zhou, Guanfu Liu","doi":"10.1080/02664763.2025.2487912","DOIUrl":"https://doi.org/10.1080/02664763.2025.2487912","url":null,"abstract":"<p><p>Multivariate meta-analysis is an efficient tool to analyze multivariate outcomes from independent studies, with the advantage of accounting for correlations between these outcomes. However, existing methods are sensitive to outliers in the data. In this paper, we propose new robust estimation methods for multivariate meta-analysis. In practice, within-study correlations are frequently not reported in studies, conventional robust multivariate methods using modified estimation equations may not be applicable. To address this challenge, we utilize robust functions to create new log-likelihood functions, by only using the diagonal components of the full covariance matrices. This approach bypasses the need for within-study correlations and also avoids the singularity problem of covariance matrices in the computation. Furthermore, the asymptotic distributions can automatically account for the missing correlations between multiple outcomes, enabling valid confidence intervals on functions of parameter estimates. Simulation studies and two real-data analyses are also carried out to demonstrate the advantages of our new robust estimation methods. Our primary focus is on bivariate meta-analysis, although the approaches can be applied more generally.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 15","pages":"2836-2872"},"PeriodicalIF":1.1,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12671434/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145668782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}