首页 > 最新文献

Journal of Applied Statistics最新文献

英文 中文
A review and comparison of methods of parameter estimation and inference for heteroskedastic linear regression models. 异方差线性回归模型参数估计与推理方法综述与比较。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-05-06 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2496719
Thomas Farrar, Renette Blignaut, Retha Luus, Sarel Steel

This article reviews methods of parameter estimation and inference in the linear regression model under heteroskedasticity. Several approaches to feasible weighted least squares estimation of the parameter vector are reviewed, along with various heteroskedasticity-consistent covariance matrix estimators, which are usually designed with inference as the end goal. A Monte Carlo experiment is designed to evaluate the ability of the reviewed methods to estimate three quantities: the variances of the random errors, the parameter vector, and the standard error of the ordinary least squares estimator thereof. Results of the experiment show that the homoskedastic variance estimator performs well at estimating error variances even in the heteroskedastic data-generating processes studied. Feasible weighted least squares approaches perform best for estimation of the parameter vector, whereas heteroskedasticity-consistent covariance matrix estimators perform best for estimation of the standard error thereof. This motivates a search for a method that would perform well in all three respects.

本文综述了异方差条件下线性回归模型的参数估计和推理方法。综述了参数向量可行加权最小二乘估计的几种方法,以及各种异方差一致协方差矩阵估计,这些估计通常以推理为最终目标。设计了蒙特卡罗实验来评估所述方法估计三个量的能力:随机误差的方差、参数向量和普通最小二乘估计量的标准误差。实验结果表明,即使在研究的异方差数据生成过程中,均方差估计器也能很好地估计误差方差。可行加权最小二乘方法对参数向量的估计效果最好,而异方差一致协方差矩阵估计方法对其标准误差的估计效果最好。这促使人们寻找一种在所有三个方面都表现良好的方法。
{"title":"A review and comparison of methods of parameter estimation and inference for heteroskedastic linear regression models.","authors":"Thomas Farrar, Renette Blignaut, Retha Luus, Sarel Steel","doi":"10.1080/02664763.2025.2496719","DOIUrl":"https://doi.org/10.1080/02664763.2025.2496719","url":null,"abstract":"<p><p>This article reviews methods of parameter estimation and inference in the linear regression model under heteroskedasticity. Several approaches to feasible weighted least squares estimation of the parameter vector are reviewed, along with various heteroskedasticity-consistent covariance matrix estimators, which are usually designed with inference as the end goal. A Monte Carlo experiment is designed to evaluate the ability of the reviewed methods to estimate three quantities: the variances of the random errors, the parameter vector, and the standard error of the ordinary least squares estimator thereof. Results of the experiment show that the homoskedastic variance estimator performs well at estimating error variances even in the heteroskedastic data-generating processes studied. Feasible weighted least squares approaches perform best for estimation of the parameter vector, whereas heteroskedasticity-consistent covariance matrix estimators perform best for estimation of the standard error thereof. This motivates a search for a method that would perform well in all three respects.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 16","pages":"3091-3120"},"PeriodicalIF":1.1,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12683765/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145714470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An empirical Bayes approach for constructing confidence intervals for clonality and entropy. 构造克隆性和熵置信区间的经验贝叶斯方法。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-04-30 DOI: 10.1080/02664763.2025.2496724
Zhongren Chen, Lu Tian, Richard A Olshen

This paper is motivated by the need to quantify human immune responses to environmental challenges. Specifically, the genome of the selected cell population from a blood sample is amplified by the PCR process, producing a large number of reads. Each read corresponds to a particular rearrangement of so-called V(D)J sequences. The observed data consist of a set of integers, representing numbers of reads corresponding to different V(D)J sequences. The underlying relative frequencies of distinct V(D)J sequences can be summarized by a probability vector, with the cardinality being the number of distinct V(D)J rearrangements. The statistical question is to make inferences on a summary parameter of this probability vector based on a multinomial-type observation of a large dimension. Popular summaries of the diversity include clonality and entropy. A point estimator of the clonality based on multiple replicates from the same blood sample has been proposed previously. Therefore, the remaining challenge is to construct confidence intervals of the parameters to reflect their uncertainty. In this paper, we propose to couple the Empirical Bayes method with a resampling-based calibration procedure to construct a robust confidence interval for different population diversity parameters. The method is illustrated via extensive numerical studies and real data examples.

这篇论文的动机是需要量化人类对环境挑战的免疫反应。具体来说,从血液样本中选择的细胞群的基因组通过PCR过程扩增,产生大量的reads。每次读取对应于所谓的V(D)J序列的特定重排。观测数据由一组整数组成,表示不同V(D)J序列对应的读取次数。不同V(D)J序列的潜在相对频率可以用一个概率向量来概括,其基数是不同V(D)J重排的数量。统计问题是根据一个大维度的多项型观测对这个概率向量的汇总参数进行推断。流行的多样性概括包括克隆性和熵。以前已经提出了基于同一血液样本的多个重复的克隆性点估计。因此,剩下的挑战是构造参数的置信区间来反映它们的不确定性。在本文中,我们提出将经验贝叶斯方法与基于重采样的校准过程相结合,以构建不同种群多样性参数的稳健置信区间。通过大量的数值研究和实际数据实例说明了该方法。
{"title":"An empirical Bayes approach for constructing confidence intervals for clonality and entropy.","authors":"Zhongren Chen, Lu Tian, Richard A Olshen","doi":"10.1080/02664763.2025.2496724","DOIUrl":"10.1080/02664763.2025.2496724","url":null,"abstract":"<p><p>This paper is motivated by the need to quantify human immune responses to environmental challenges. Specifically, the genome of the selected cell population from a blood sample is amplified by the PCR process, producing a large number of reads. Each read corresponds to a particular rearrangement of so-called V(D)J sequences. The observed data consist of a set of integers, representing numbers of reads corresponding to different V(D)J sequences. The underlying relative frequencies of distinct V(D)J sequences can be summarized by a probability vector, with the cardinality being the number of distinct V(D)J rearrangements. The statistical question is to make inferences on a summary parameter of this probability vector based on a multinomial-type observation of a large dimension. Popular summaries of the diversity include clonality and entropy. A point estimator of the clonality based on multiple replicates from the same blood sample has been proposed previously. Therefore, the remaining challenge is to construct confidence intervals of the parameters to reflect their uncertainty. In this paper, we propose to couple the Empirical Bayes method with a resampling-based calibration procedure to construct a robust confidence interval for different population diversity parameters. The method is illustrated via extensive numerical studies and real data examples.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12435542/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145075083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal distributed subsampling for accelerated failure time models with massive censored data. 具有大量截尾数据的加速失效时间模型的最优分布子抽样。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-04-29 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2495717
Chunjie Wang, Jing Li, Xiaohui Yuan

The availability of massive data stored across multiple locations is increasing in many fields. The data at each site often exhibits large-scale features. Current research primarily focuses on such datasets that consist of uncensored observations. As a popular model in survival analysis, the AFT model provides an intuitive explanation of survival times, making the model results easier to understand in practical applications. In this paper, we develop a distributed subsampling procedure specifically designed for accelerated failure time (AFT) model. The consistency and asymptotic normality of the resulting estimator are proved. A two-step algorithm is provided to address practical implementation issues and to determine both the optimal subsampling probabilities and allocation sizes. We conduct numerical simulation studies to evaluate the performance of our method and apply it to a lymphoma dataset.

在许多领域,存储在多个位置的海量数据的可用性正在增加。每个站点的数据往往显示出大规模的特征。目前的研究主要集中在这些由未经审查的观测组成的数据集上。AFT模型是生存分析中常用的一种模型,它能直观地解释生存时间,使模型结果在实际应用中更容易理解。在本文中,我们开发了一种专门为加速失效时间(AFT)模型设计的分布式子采样程序。证明了所得估计量的相合性和渐近正态性。提供了一种两步算法来解决实际实施问题,并确定最优子抽样概率和分配大小。我们进行数值模拟研究来评估我们的方法的性能,并将其应用于淋巴瘤数据集。
{"title":"Optimal distributed subsampling for accelerated failure time models with massive censored data.","authors":"Chunjie Wang, Jing Li, Xiaohui Yuan","doi":"10.1080/02664763.2025.2495717","DOIUrl":"https://doi.org/10.1080/02664763.2025.2495717","url":null,"abstract":"<p><p>The availability of massive data stored across multiple locations is increasing in many fields. The data at each site often exhibits large-scale features. Current research primarily focuses on such datasets that consist of uncensored observations. As a popular model in survival analysis, the AFT model provides an intuitive explanation of survival times, making the model results easier to understand in practical applications. In this paper, we develop a distributed subsampling procedure specifically designed for accelerated failure time (AFT) model. The consistency and asymptotic normality of the resulting estimator are proved. A two-step algorithm is provided to address practical implementation issues and to determine both the optimal subsampling probabilities and allocation sizes. We conduct numerical simulation studies to evaluate the performance of our method and apply it to a lymphoma dataset.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 16","pages":"3036-3052"},"PeriodicalIF":1.1,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12683775/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145714432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inconsistency of three indices in measuring the association between the risk factor and the risk of a disease. 三个指标在衡量危险因素与疾病危险之间的相关性时不一致。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-04-25 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2494132
Changyong Feng, Hongyue Wang, Honghong Liu

The relative risk (r), risk difference (d), and odds ratio ( θ ) are three commonly used indices in epidemiology to quantify the association between the risk of a disease and the exposure to a risk factor. However, it has been reported in [C. Feng, B. Wang, and H. Wang, The relations among three popular indices of risks, Stat. Med. 38(2019), pp. 4772-4787.] that there is no monotonic relationship between any two of these three indices. In fact, our research shows that even if two of these indices change in the same direction, the third one may change in the opposite direction. This indicates that there is an internal inconsistency among these three indices in measuring the association between the risk of the disease and the risk factor. Therefore, the sizes of these indices cannot be interpreted as the strength of the association. We have derived some limiting behaviors of these indices and discussed the approximation of risk ratio by odds ratio. Our results clarify some misconceptions about these three widely used indices. In summary, our research highlights the limitations of using only one of these indices to measure the association between the risk of a disease and the exposure to a risk factor. To fully understand the nature of the association, it is important to consider all three indices and their relationships.

相对危险度(r)、危险度差(d)和比值比(θ)是流行病学中常用的三个指标,用于量化疾病风险与暴露于危险因素之间的关系。然而,据报道[C]。冯斌、王红红、王红红、王红红,三种流行风险指标的关系,医学统计,38(2019),pp. 4772-4787。这三个指标中的任何两个之间都不存在单调关系。事实上,我们的研究表明,即使其中两个指标朝着同一个方向变化,第三个指标也可能朝着相反的方向变化。这说明在衡量疾病风险与危险因素的相关性时,这三个指标之间存在着内在的不一致性。因此,这些指数的大小不能被解释为关联的强度。我们推导了这些指标的一些极限行为,并讨论了用比值比逼近风险比的方法。我们的结果澄清了对这三个广泛使用的指标的一些误解。总之,我们的研究强调了仅使用这些指标中的一个来衡量疾病风险与暴露于风险因素之间的关联的局限性。为了充分理解这种关联的本质,考虑所有三个指数及其关系是很重要的。
{"title":"Inconsistency of three indices in measuring the association between the risk factor and the risk of a disease.","authors":"Changyong Feng, Hongyue Wang, Honghong Liu","doi":"10.1080/02664763.2025.2494132","DOIUrl":"https://doi.org/10.1080/02664763.2025.2494132","url":null,"abstract":"<p><p>The relative risk (<i>r</i>), risk difference (<i>d</i>), and odds ratio <math><mo>(</mo> <mi>θ</mi> <mo>)</mo></math> are three commonly used indices in epidemiology to quantify the association between the risk of a disease and the exposure to a risk factor. However, it has been reported in [C. Feng, B. Wang, and H. Wang, The relations among three popular indices of risks, Stat. Med. 38(2019), pp. 4772-4787.] that there is no monotonic relationship between any two of these three indices. In fact, our research shows that even if two of these indices change in the same direction, the third one may change in the opposite direction. This indicates that there is an internal inconsistency among these three indices in measuring the association between the risk of the disease and the risk factor. Therefore, the sizes of these indices cannot be interpreted as the strength of the association. We have derived some limiting behaviors of these indices and discussed the approximation of risk ratio by odds ratio. Our results clarify some misconceptions about these three widely used indices. In summary, our research highlights the limitations of using only one of these indices to measure the association between the risk of a disease and the exposure to a risk factor. To fully understand the nature of the association, it is important to consider all three indices and their relationships.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 16","pages":"3020-3035"},"PeriodicalIF":1.1,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12683764/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145714483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IMCKDE algorithm: an improvement in a clustering technique based on kernel density estimation. IMCKDE算法:基于核密度估计的聚类技术的改进。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-04-23 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2495718
Paulo Muraro Ferreira, Mariana Kleina

Given the increasing volume of data available, much of which lacks established categories, the development of algorithms capable of finding patterns in raw, unclassified data is becoming increasingly important. One type of clustering algorithm is the MulticlusterKDE, which is based on the search for centroids by maximizing the kernel density estimation function, which assumes local maxima at points of highest data density. The aim of this work is to propose a clustering algorithm based on improvements to the MulticlusterKDE algorithm, named IMCKDE. These improvements occur both in terms of response quality and computational time. Furthermore, it was observed that the MulticlusterKDE algorithm has prohibitively long computation times for large datasets, highlighting the relevance of IMCKDE.

鉴于可用数据量的增加,其中许多缺乏既定的类别,能够在原始的、未分类的数据中发现模式的算法的开发变得越来越重要。一种类型的聚类算法是MulticlusterKDE,它基于通过最大化核密度估计函数来搜索质心,该函数假设在数据密度最高的点处存在局部最大值。这项工作的目的是提出一种基于MulticlusterKDE算法改进的聚类算法,称为IMCKDE。这些改进发生在响应质量和计算时间两方面。此外,我们观察到,对于大型数据集,MulticlusterKDE算法的计算时间非常长,这突出了IMCKDE的相关性。
{"title":"IMCKDE algorithm: an improvement in a clustering technique based on kernel density estimation.","authors":"Paulo Muraro Ferreira, Mariana Kleina","doi":"10.1080/02664763.2025.2495718","DOIUrl":"https://doi.org/10.1080/02664763.2025.2495718","url":null,"abstract":"<p><p>Given the increasing volume of data available, much of which lacks established categories, the development of algorithms capable of finding patterns in raw, unclassified data is becoming increasingly important. One type of clustering algorithm is the MulticlusterKDE, which is based on the search for centroids by maximizing the kernel density estimation function, which assumes local maxima at points of highest data density. The aim of this work is to propose a clustering algorithm based on improvements to the MulticlusterKDE algorithm, named IMCKDE. These improvements occur both in terms of response quality and computational time. Furthermore, it was observed that the MulticlusterKDE algorithm has prohibitively long computation times for large datasets, highlighting the relevance of IMCKDE.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 16","pages":"3053-3072"},"PeriodicalIF":1.1,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12683736/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145714415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Weighted negative binomial distribution: properties and applications. 加权负二项分布:性质与应用。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-04-22 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2492277
C Satheesh Kumar, Prince Sathyan

Here we consider a weighted version of the negative binomial distribution and illustrate its usefulness through fitting Covid-19 datasets. We obtain several important properties of the distribution such as probability generating function, cumulative distribution function, survival and hazard rate functions, as well as expressions for factorial and raw moments, and recurrence relations for probabilities. Further we discuss the estimation of the parameters of the model and constructed certain test procedures for examining the significance of the parameter. A simulation study is carried out for assessing the performance of the estimators of the parameters of the model obtained through the method of maximum likelihood.

在这里,我们考虑负二项分布的加权版本,并通过拟合Covid-19数据集说明其有用性。我们得到了分布的几个重要性质,如概率生成函数、累积分布函数、生存和风险率函数,以及阶乘和原始矩的表达式,以及概率的递归关系。进一步讨论了模型参数的估计,并构造了检验参数显著性的检验程序。对最大似然法得到的模型参数估计器的性能进行了仿真研究。
{"title":"Weighted negative binomial distribution: properties and applications.","authors":"C Satheesh Kumar, Prince Sathyan","doi":"10.1080/02664763.2025.2492277","DOIUrl":"https://doi.org/10.1080/02664763.2025.2492277","url":null,"abstract":"<p><p>Here we consider a weighted version of the negative binomial distribution and illustrate its usefulness through fitting Covid-19 datasets. We obtain several important properties of the distribution such as probability generating function, cumulative distribution function, survival and hazard rate functions, as well as expressions for factorial and raw moments, and recurrence relations for probabilities. Further we discuss the estimation of the parameters of the model and constructed certain test procedures for examining the significance of the parameter. A simulation study is carried out for assessing the performance of the estimators of the parameters of the model obtained through the method of maximum likelihood.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 16","pages":"3003-3019"},"PeriodicalIF":1.1,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12683738/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145714460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multiresolution granger causality testing with variational mode decomposition: a python software. 用变分模态分解的多分辨率格兰杰因果关系检验:一个python软件。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-04-16 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2492257
Foued Saâdaoui, Hana Rabbouch

In this paper, we introduce a novel and advanced multiscale approach to Granger causality testing, achieved by integrating Variational Mode Decomposition (VMD) with traditional statistical causality methods. Our approach decomposes complex time series data into intrinsic mode functions (IMFs), each representing a distinct frequency scale, thus enabling a more precise and granular analysis of causal relationships across multiple scales. By applying Granger causality tests to the stationary IMFs, we uncover causal patterns that are often concealed in aggregated data, providing a more comprehensive understanding of the underlying system dynamics. This methodology is implemented in a Python-based software package, featuring an intuitive, user-friendly interface that enhances accessibility for both researchers and practitioners. The integration of VMD with Granger causality significantly enhances the flexibility and robustness of causal analysis, making it particularly effective in fields such as finance, engineering, and medicine, where data complexity is a significant challenge. Extensive empirical studies, including analyzes of cryptocurrency data, biomedical signals, and simulation experiments, validate the effectiveness of our approach. Our method demonstrates a superior ability to reveal hidden causal interactions, offering greater accuracy and precision than leading existing techniques.

本文介绍了一种新的、先进的多尺度格兰杰因果检验方法,该方法通过将变分模态分解(VMD)与传统的统计因果关系方法相结合来实现。我们的方法将复杂的时间序列数据分解为内在模式函数(imf),每个函数代表一个不同的频率尺度,从而能够更精确和更细致地分析多个尺度的因果关系。通过对平稳的国际货币基金组织应用格兰杰因果检验,我们发现了通常隐藏在汇总数据中的因果模式,从而对潜在的系统动力学提供了更全面的理解。该方法在基于python的软件包中实现,具有直观,用户友好的界面,增强了研究人员和从业者的可访问性。VMD与Granger因果关系的集成显著增强了因果分析的灵活性和鲁棒性,使其在金融、工程和医学等领域特别有效,这些领域的数据复杂性是一个重大挑战。广泛的实证研究,包括对加密货币数据、生物医学信号和模拟实验的分析,验证了我们方法的有效性。我们的方法展示了揭示隐藏因果关系的卓越能力,比领先的现有技术提供更高的准确性和精度。
{"title":"Multiresolution granger causality testing with variational mode decomposition: a python software.","authors":"Foued Saâdaoui, Hana Rabbouch","doi":"10.1080/02664763.2025.2492257","DOIUrl":"https://doi.org/10.1080/02664763.2025.2492257","url":null,"abstract":"<p><p>In this paper, we introduce a novel and advanced multiscale approach to Granger causality testing, achieved by integrating Variational Mode Decomposition (VMD) with traditional statistical causality methods. Our approach decomposes complex time series data into intrinsic mode functions (IMFs), each representing a distinct frequency scale, thus enabling a more precise and granular analysis of causal relationships across multiple scales. By applying Granger causality tests to the stationary IMFs, we uncover causal patterns that are often concealed in aggregated data, providing a more comprehensive understanding of the underlying system dynamics. This methodology is implemented in a Python-based software package, featuring an intuitive, user-friendly interface that enhances accessibility for both researchers and practitioners. The integration of VMD with Granger causality significantly enhances the flexibility and robustness of causal analysis, making it particularly effective in fields such as finance, engineering, and medicine, where data complexity is a significant challenge. Extensive empirical studies, including analyzes of cryptocurrency data, biomedical signals, and simulation experiments, validate the effectiveness of our approach. Our method demonstrates a superior ability to reveal hidden causal interactions, offering greater accuracy and precision than leading existing techniques.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 16","pages":"3151-3172"},"PeriodicalIF":1.1,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12683760/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145714427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating longitudinal biomarker effects using a Lasso-network constrained time-Varying mixed effects model. 利用lasso -网络约束时变混合效应模型估计纵向生物标志物效应。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-04-15 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2490975
Shiqi Liu, Weiwei Zhuang, Jinfeng Xu, Steven Xu, Min Yuan

The relationship between covariates and outcomes can change over time, regardless of whether these covariates are time-varying or static. For instance, the influence of circulating biomarkers like white blood cell counts on the efficacy of standard chemotherapy in cancer patients may shift throughout the treatment duration. Traditional models with constant coefficients may fail to capture these dynamic interactions. Additionally, when multiple covariates are present, their interactions within and across time periods can become complex. To address these issues, we introduce a Lasso-Network constrained time-varying linear mixed-effects model (TVLMM) accompanied by an efficient two-stage parameter estimation algorithm that tracks the evolution of fixed-effect coefficients over time. We validate our approach through extensive simulations that highlight its effectiveness and computational efficiency in high-dimensional settings. Our method is further applied to real data from a randomized clinical trial of patients with metastatic colorectal cancer (mCRC), treated with standard chemotherapy with or without panitumumab. This case study demonstrates how our approach adeptly captures the time-varying impacts of critical circulating biomarkers on treatment outcomes, specifically tumor size reduction.

无论这些协变量是时变的还是静态的,协变量和结果之间的关系都可能随着时间的推移而改变。例如,白细胞计数等循环生物标志物对癌症患者标准化疗疗效的影响可能在整个治疗期间发生变化。传统的常系数模型可能无法捕捉到这些动态的相互作用。此外,当存在多个协变量时,它们在时间段内和跨时间段的相互作用会变得复杂。为了解决这些问题,我们引入了Lasso-Network约束时变线性混合效应模型(TVLMM),并伴随着有效的两阶段参数估计算法,该算法跟踪固定效应系数随时间的演变。我们通过广泛的模拟来验证我们的方法,突出了它在高维环境中的有效性和计算效率。我们的方法进一步应用于转移性结直肠癌(mCRC)患者的随机临床试验的真实数据,这些患者接受标准化疗,或不使用panitumumab。本案例研究展示了我们的方法如何巧妙地捕捉关键循环生物标志物对治疗结果的时变影响,特别是肿瘤大小的减少。
{"title":"Estimating longitudinal biomarker effects using a Lasso-network constrained time-Varying mixed effects model.","authors":"Shiqi Liu, Weiwei Zhuang, Jinfeng Xu, Steven Xu, Min Yuan","doi":"10.1080/02664763.2025.2490975","DOIUrl":"https://doi.org/10.1080/02664763.2025.2490975","url":null,"abstract":"<p><p>The relationship between covariates and outcomes can change over time, regardless of whether these covariates are time-varying or static. For instance, the influence of circulating biomarkers like white blood cell counts on the efficacy of standard chemotherapy in cancer patients may shift throughout the treatment duration. Traditional models with constant coefficients may fail to capture these dynamic interactions. Additionally, when multiple covariates are present, their interactions within and across time periods can become complex. To address these issues, we introduce a Lasso-Network constrained time-varying linear mixed-effects model (TVLMM) accompanied by an efficient two-stage parameter estimation algorithm that tracks the evolution of fixed-effect coefficients over time. We validate our approach through extensive simulations that highlight its effectiveness and computational efficiency in high-dimensional settings. Our method is further applied to real data from a randomized clinical trial of patients with metastatic colorectal cancer (mCRC), treated with standard chemotherapy with or without panitumumab. This case study demonstrates how our approach adeptly captures the time-varying impacts of critical circulating biomarkers on treatment outcomes, specifically tumor size reduction.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 16","pages":"2985-3002"},"PeriodicalIF":1.1,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12683766/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145714473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimation of quantile versions of the Lorenz curve and the Gini index for the generalized Pareto distribution. 广义帕累托分布的洛伦兹曲线和基尼指数的分位数估计。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-04-11 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2490105
Alicja Jokiel-Rokita, Agnieszka Siedlaczek

This paper concerns the estimation of quantile versions of the Lorenz curve and the Gini index in the case of the generalized Pareto distribution. These curves and indices, unlike the Lorenz curve and the Gini index, are also defined for distributions whose expected value is not finite. The quantile versions of the Lorenz curve and the Gini index of the generalized Pareto distribution depend only on the shape parameter. Accuracy of the shape parameter estimators, recommended in the literature and those whose accuracy has not been studied so far, is compared in simulations. The accuracy of the plug-in estimators of the quantile versions of the Lorenz curve and the Gini index is also studied. Based on the simulations performed, if the sample size is not too large, we recommend using Zhang's estimator of the shape parameter in the estimation of quantile versions of the Lorenz curve and the Gini index. In case the shape parameter is small we also recommend the IPO estimator. The applications of the described methods in the real data analysis are also presented.

本文讨论了广义帕累托分布情况下洛伦兹曲线和基尼指数的分位数估计。与洛伦兹曲线和基尼指数不同,这些曲线和指数也是为期望值不是有限的分布而定义的。洛伦兹曲线的分位数版本和广义帕累托分布的基尼指数只依赖于形状参数。在仿真中比较了文献中推荐的形状参数估计器和目前尚未研究的形状参数估计器的精度。本文还研究了洛伦兹曲线和基尼指数的分位数版本的插入估计器的准确性。根据所进行的模拟,如果样本量不是太大,我们建议在估计Lorenz曲线和基尼指数的分位数版本时使用Zhang的形状参数估计器。如果形状参数很小,我们也推荐使用IPO估计器。并介绍了这些方法在实际数据分析中的应用。
{"title":"Estimation of quantile versions of the Lorenz curve and the Gini index for the generalized Pareto distribution.","authors":"Alicja Jokiel-Rokita, Agnieszka Siedlaczek","doi":"10.1080/02664763.2025.2490105","DOIUrl":"https://doi.org/10.1080/02664763.2025.2490105","url":null,"abstract":"<p><p>This paper concerns the estimation of quantile versions of the Lorenz curve and the Gini index in the case of the generalized Pareto distribution. These curves and indices, unlike the Lorenz curve and the Gini index, are also defined for distributions whose expected value is not finite. The quantile versions of the Lorenz curve and the Gini index of the generalized Pareto distribution depend only on the shape parameter. Accuracy of the shape parameter estimators, recommended in the literature and those whose accuracy has not been studied so far, is compared in simulations. The accuracy of the plug-in estimators of the quantile versions of the Lorenz curve and the Gini index is also studied. Based on the simulations performed, if the sample size is not too large, we recommend using Zhang's estimator of the shape parameter in the estimation of quantile versions of the Lorenz curve and the Gini index. In case the shape parameter is small we also recommend the IPO estimator. The applications of the described methods in the real data analysis are also presented.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 15","pages":"2941-2957"},"PeriodicalIF":1.1,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12671419/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145668609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multivariate meta-analysis with a robustified diagonal likelihood function. 采用稳健对角似然函数的多元元分析。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-04-11 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2487912
Zongliang Hu, Qianyu Zhou, Guanfu Liu

Multivariate meta-analysis is an efficient tool to analyze multivariate outcomes from independent studies, with the advantage of accounting for correlations between these outcomes. However, existing methods are sensitive to outliers in the data. In this paper, we propose new robust estimation methods for multivariate meta-analysis. In practice, within-study correlations are frequently not reported in studies, conventional robust multivariate methods using modified estimation equations may not be applicable. To address this challenge, we utilize robust functions to create new log-likelihood functions, by only using the diagonal components of the full covariance matrices. This approach bypasses the need for within-study correlations and also avoids the singularity problem of covariance matrices in the computation. Furthermore, the asymptotic distributions can automatically account for the missing correlations between multiple outcomes, enabling valid confidence intervals on functions of parameter estimates. Simulation studies and two real-data analyses are also carried out to demonstrate the advantages of our new robust estimation methods. Our primary focus is on bivariate meta-analysis, although the approaches can be applied more generally.

多变量荟萃分析是一种分析独立研究多变量结果的有效工具,具有考虑这些结果之间相关性的优势。然而,现有的方法对数据中的异常值很敏感。在本文中,我们提出了新的鲁棒估计方法用于多元元分析。在实践中,研究中经常没有报告研究内相关性,使用修正估计方程的传统稳健多元方法可能不适用。为了解决这一挑战,我们利用鲁棒函数来创建新的对数似然函数,只使用完整协方差矩阵的对角分量。这种方法绕过了研究内相关性的需要,也避免了计算中协方差矩阵的奇异性问题。此外,渐近分布可以自动解释多个结果之间缺失的相关性,从而在参数估计函数上实现有效的置信区间。仿真研究和两个实际数据分析也证明了我们的新鲁棒估计方法的优越性。我们的主要重点是双变量元分析,尽管这些方法可以更广泛地应用。
{"title":"Multivariate meta-analysis with a robustified diagonal likelihood function.","authors":"Zongliang Hu, Qianyu Zhou, Guanfu Liu","doi":"10.1080/02664763.2025.2487912","DOIUrl":"https://doi.org/10.1080/02664763.2025.2487912","url":null,"abstract":"<p><p>Multivariate meta-analysis is an efficient tool to analyze multivariate outcomes from independent studies, with the advantage of accounting for correlations between these outcomes. However, existing methods are sensitive to outliers in the data. In this paper, we propose new robust estimation methods for multivariate meta-analysis. In practice, within-study correlations are frequently not reported in studies, conventional robust multivariate methods using modified estimation equations may not be applicable. To address this challenge, we utilize robust functions to create new log-likelihood functions, by only using the diagonal components of the full covariance matrices. This approach bypasses the need for within-study correlations and also avoids the singularity problem of covariance matrices in the computation. Furthermore, the asymptotic distributions can automatically account for the missing correlations between multiple outcomes, enabling valid confidence intervals on functions of parameter estimates. Simulation studies and two real-data analyses are also carried out to demonstrate the advantages of our new robust estimation methods. Our primary focus is on bivariate meta-analysis, although the approaches can be applied more generally.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 15","pages":"2836-2872"},"PeriodicalIF":1.1,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12671434/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145668782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Applied Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1