Journal of Applied Statistics最新文献_第6页

Change point detection to analyze air pollution and its economic effects: an exponentially weighted moving average perspective. 变化点检测分析空气污染及其经济影响：指数加权移动平均视角。

IF 1.1 4区数学 Q2 STATISTICS & PROBABILITY

Journal of Applied Statistics

Pub Date : 2025-02-02 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2455636

Shabbir Ahmad, Muhammad Riaz, Tahir Mahmood, Nasir Abbas

Air pollution has a direct impact on every society, leading to consequential effects on the economy of a nation. Poor air quality adversely affects human health, resulting in various economic outcomes such as rising healthcare costs, diminished labor productivity, negative impacts on tourism and living standards, increased regulatory expenses for businesses, and heightened economic disparities. Effective control methods are essential to monitor factors influencing the economy, including air quality. The presence of toxic substances in the air reduces air quality, necessitating its monitoring through indices like PM10. Among statistical process control tools, control charts are the most prominent for efficient change point detection. This study introduces a new process monitoring tool that incorporates additional auxiliary information, if available, alongside the main variable of interest. The proposed methodology ensures detection ability remains robust, even under disturbances in the auxiliary variable. Furthermore, mathematical analyses reveal that many existing statistical quality control tools become special cases of the proposed structure for specific sensitivity parameter values. Evaluated through properties of run length distribution, the proposed chart allows control of the robustness-efficiency balance by adjusting its sensitivity parameter. A practical implementation demonstrates the effectiveness of the chart in monitoring air quality data.

空气污染对每个社会都有直接的影响，并对一个国家的经济产生相应的影响。空气质量差对人类健康产生不利影响，导致各种经济后果，如医疗保健成本上升、劳动生产率下降、对旅游业和生活水平产生负面影响、企业监管费用增加以及经济差距加剧。有效的控制方法对于监测包括空气质量在内的影响经济的因素至关重要。空气中有毒物质的存在降低了空气质量，因此有必要通过PM10等指数进行监测。在统计过程控制工具中，控制图是最突出的有效的变化点检测工具。本研究引入了一种新的过程监控工具，如果可用，它将附加的辅助信息与感兴趣的主要变量结合在一起。所提出的方法确保检测能力保持鲁棒性，即使在辅助变量的干扰下。此外，数学分析表明，对于特定的灵敏度参数值，许多现有的统计质量控制工具成为所提出结构的特殊情况。通过对运行长度分布特性的评估，所提出的图可以通过调整其灵敏度参数来控制鲁棒性与效率的平衡。实际应用表明，该图表在监测空气质量数据方面是有效的。

{"title":"Change point detection to analyze air pollution and its economic effects: an exponentially weighted moving average perspective.","authors":"Shabbir Ahmad, Muhammad Riaz, Tahir Mahmood, Nasir Abbas","doi":"10.1080/02664763.2025.2455636","DOIUrl":"10.1080/02664763.2025.2455636","url":null,"abstract":"Air pollution has a direct impact on every society, leading to consequential effects on the economy of a nation. Poor air quality adversely affects human health, resulting in various economic outcomes such as rising healthcare costs, diminished labor productivity, negative impacts on tourism and living standards, increased regulatory expenses for businesses, and heightened economic disparities. Effective control methods are essential to monitor factors influencing the economy, including air quality. The presence of toxic substances in the air reduces air quality, necessitating its monitoring through indices like PM10. Among statistical process control tools, control charts are the most prominent for efficient change point detection. This study introduces a new process monitoring tool that incorporates additional auxiliary information, if available, alongside the main variable of interest. The proposed methodology ensures detection ability remains robust, even under disturbances in the auxiliary variable. Furthermore, mathematical analyses reveal that many existing statistical quality control tools become special cases of the proposed structure for specific sensitivity parameter values. Evaluated through properties of run length distribution, the proposed chart allows control of the robustness-efficiency balance by adjusting its sensitivity parameter. A practical implementation demonstrates the effectiveness of the chart in monitoring air quality data.","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 11","pages":"2113-2155"},"PeriodicalIF":1.1,"publicationDate":"2025-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12404093/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144992776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On the use and misuse of time-rescaling to assess the goodness-of-fit of self-exciting temporal point processes. 自激时间点过程拟合优度评估的时间重标化方法。

IF 1.1 4区数学 Q2 STATISTICS & PROBABILITY

Journal of Applied Statistics

Pub Date : 2025-02-02 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2459245

M-A El-Aroui

The paper first highlights important drawbacks and biases related to the common use of time-rescaling to assess the goodness-of-fit (Gof) of self-exciting temporal point process (SETPP) models. Then it presents a new predictive time-rescaling approach leading to an asymptotically unbiased Gof framework for general SETPPs in the case of single observed trajectories. The predictive approach focuses on forecasting accuracy and addresses the bias problem resulting from the plugged-in estimated parameters. Dawid's prequential approach is used and the models' checking is mainly based on the forecasting accuracy of arrival times. These times are transformed, using sequentially estimated parameters, into random vectors which are proved to converge in probability under the null hypothesis and standard regulatory conditions to vectors of iid Exponential(1) rv's. Numerical experiments are used to compare the performances of the standard and predictive time-rescaling for Gof assessment of non-homogeneous Poisson and Hawkes self-exciting temporal processes. Data of Japanese seismic events are also used to illustrate the dynamic aspect of the proposed model-checking approach.

本文首先强调了通常使用时间重标来评估自激时间点过程（SETPP）模型的拟合优度（Gof）的重要缺陷和偏差。然后，提出了一种新的预测时间重标方法，从而得到了单观测轨迹情况下一般setp的渐近无偏Gof框架。预测方法侧重于预测精度，并解决了由插入估计参数引起的偏差问题。采用david的先验方法，对模型的检验主要基于到达时间的预测精度。使用顺序估计的参数将这些时间转换成随机向量，证明在零假设和标准调节条件下，这些随机向量在概率上收敛于iid指数(1)rv的向量。通过数值实验比较了非齐次Poisson和Hawkes自激时间过程的标准时间尺度和预测时间尺度对Gof评价的性能。日本地震事件的数据也被用来说明所提出的模型检验方法的动态方面。

{"title":"On the use and misuse of time-rescaling to assess the goodness-of-fit of self-exciting temporal point processes.","authors":"M-A El-Aroui","doi":"10.1080/02664763.2025.2459245","DOIUrl":"10.1080/02664763.2025.2459245","url":null,"abstract":"The paper first highlights important drawbacks and biases related to the common use of time-rescaling to assess the goodness-of-fit (Gof) of self-exciting temporal point process (SETPP) models. Then it presents a new predictive time-rescaling approach leading to an asymptotically unbiased Gof framework for general SETPPs in the case of single observed trajectories. The predictive approach focuses on forecasting accuracy and addresses the bias problem resulting from the plugged-in estimated parameters. Dawid's prequential approach is used and the models' checking is mainly based on the forecasting accuracy of arrival times. These times are transformed, using sequentially estimated parameters, into random vectors which are proved to converge in probability under the null hypothesis and standard regulatory conditions to vectors of iid Exponential(1) rv's. Numerical experiments are used to compare the performances of the standard and predictive time-rescaling for Gof assessment of non-homogeneous Poisson and Hawkes self-exciting temporal processes. Data of Japanese seismic events are also used to illustrate the dynamic aspect of the proposed model-checking approach.","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 12","pages":"2247-2270"},"PeriodicalIF":1.1,"publicationDate":"2025-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416029/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145029909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Gradient test to assess homogeneity of probabilities in discrete-time transition models with application in agricultural science data. 在农业科学数据中应用梯度检验评估离散时间过渡模型的概率同质性。

IF 1.1 4区数学 Q2 STATISTICS & PROBABILITY

Journal of Applied Statistics

Pub Date : 2025-02-02 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2457008

Laura Vicuña Torres de Paula, Idemauro Antonio Rodrigues de Lara, Cesar Auguto Taconeli, Carolina Reigada, Rafael de Andrade Moral

Longitudinal studies in discrete or continuous time involving categorical data are common in agricultural sciences. Transition models can be used as a means to analyse the resulting data, especially when the aim is to describe category changes over time, as well as to accommodate covariates due to experimental design. Here we focus on discrete-time models, for which it is critical to assess whether the underlying process is stationary or not. Tests based on likelihood procedures are very useful, and here we propose the Gradient test to assess stationary, or homogeneity of transition probabilities. We carried out simulation studies to evaluate the performance of the proposed test, which indicated a good performance regarding type-I error and power when compared to other classical tests available in the literature. As motivation we present two studies with agricultural data, the first one applied to entomology with nominal responses and the second application refers to the degree of injury in pigs. Using our proposed test, stationarity and non-stationarity were verified respectively in the applications. Since the gradient test to assess stationarity has a simplified structure when compared to other tests, it is therefore a useful alternative when carrying out inference in these types of models.

涉及分类数据的离散或连续时间的纵向研究在农业科学中很常见。过渡模型可以用作分析结果数据的一种手段，特别是当目的是描述随时间的类别变化时，以及由于实验设计而容纳协变量。在这里，我们将重点放在离散时间模型上，对于离散时间模型来说，评估潜在过程是否平稳至关重要。基于似然程序的测试非常有用，在这里我们提出梯度测试来评估转移概率的平稳性或同质性。我们进行了仿真研究来评估所提出的测试的性能，与文献中可用的其他经典测试相比，该测试在i型误差和功率方面表现良好。作为动机，我们提出了两项农业数据研究，第一个应用于昆虫学，具有名义响应，第二个应用涉及猪的伤害程度。利用本文提出的测试方法，分别在应用中验证了平稳性和非平稳性。由于与其他测试相比，评估平稳性的梯度测试具有简化的结构，因此在这些类型的模型中进行推理时，它是一个有用的替代方案。

{"title":"Gradient test to assess homogeneity of probabilities in discrete-time transition models with application in agricultural science data.","authors":"Laura Vicuña Torres de Paula, Idemauro Antonio Rodrigues de Lara, Cesar Auguto Taconeli, Carolina Reigada, Rafael de Andrade Moral","doi":"10.1080/02664763.2025.2457008","DOIUrl":"10.1080/02664763.2025.2457008","url":null,"abstract":"Longitudinal studies in discrete or continuous time involving categorical data are common in agricultural sciences. Transition models can be used as a means to analyse the resulting data, especially when the aim is to describe category changes over time, as well as to accommodate covariates due to experimental design. Here we focus on discrete-time models, for which it is critical to assess whether the underlying process is stationary or not. Tests based on likelihood procedures are very useful, and here we propose the Gradient test to assess stationary, or homogeneity of transition probabilities. We carried out simulation studies to evaluate the performance of the proposed test, which indicated a good performance regarding type-I error and power when compared to other classical tests available in the literature. As motivation we present two studies with agricultural data, the first one applied to entomology with nominal responses and the second application refers to the degree of injury in pigs. Using our proposed test, stationarity and non-stationarity were verified respectively in the applications. Since the gradient test to assess stationarity has a simplified structure when compared to other tests, it is therefore a useful alternative when carrying out inference in these types of models.","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 11","pages":"2172-2190"},"PeriodicalIF":1.1,"publicationDate":"2025-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12404091/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144992721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Pathway-based genetic association analysis for overdispersed count data. 过度分散计数数据的通路遗传关联分析。

IF 1.1 4区数学 Q2 STATISTICS & PROBABILITY

Journal of Applied Statistics

Pub Date : 2025-02-02 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2460073

Yang Liu

Overdispersion is a common phenomenon in genetic data, such as gene expression count data. In genetic association studies, it is important to investigate the association between a gene expression and a set of genetic variants from a pathway. However, existing approaches for pathway analysis are primarily designed for continuous and binary outcomes and are not applicable to overdispersed count data. In this paper, we propose a hierarchical approach to analyze the association between an overdispersed count response and a set of low-frequency genetic variants in negative binomial regression. We derive score-type test statistics for both fixed and random effects of genetic variants, and further introduce a novel procedure for efficiently combining these two statistics for global testing. Through simulation studies, we demonstrate that the proposed method tends to be more powerful than existing methods under a wide range of scenarios. Additionally, we apply the proposed method to a colorectal cancer study, demonstrating its power in identifying associations between gene expression and somatic mutations.

过度分散是基因数据中常见的现象，如基因表达计数数据。在遗传关联研究中，重要的是研究基因表达与一组来自途径的遗传变异之间的关系。然而，现有的途径分析方法主要是为连续和二元结果设计的，不适用于过度分散的计数数据。在本文中，我们提出了一种分层方法来分析负二项回归中过度分散计数响应与一组低频遗传变异之间的关系。我们推导了遗传变异的固定效应和随机效应的得分型测试统计，并进一步引入了一种有效地将这两种统计结合起来进行全局测试的新方法。通过仿真研究，我们证明了该方法在广泛的场景下比现有方法更强大。此外，我们将提出的方法应用于结直肠癌研究，证明其在识别基因表达和体细胞突变之间的关联方面的能力。

引用次数: 0

Penalized functional regression using R package PFLR. 惩罚函数回归使用R包PFLR。

IF 1.1 4区数学 Q2 STATISTICS & PROBABILITY

Journal of Applied Statistics

Pub Date : 2025-01-28 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2457011

Rob Cameron, Tianyu Guan, Haolun Shi, Zhenhua Lin

Penalized functional regression is a useful tool to estimate models for applications where the effect/coefficient function is assumed to be truncated. The truncated coefficient function occurs when the functional predictor does not influence the response after a certain cutoff point on the time domain. The R package PFLR offers an extensive suite of methods for advanced functional regression techniques with penalization. The package implements four distinct methods, each tailored to different models, effectively addressing a range of scenarios. This is demonstrated through simulations as well as an application to particulate matter emissions data. Generic S3 methods are also implemented for each model to help with summary, visualization and interpretation.

对于假设效应/系数函数被截断的应用程序，惩罚函数回归是估计模型的有用工具。当函数预测器在时域上的某个截止点之后不影响响应时，就会出现截断系数函数。R包PFLR为高级函数回归技术提供了一套广泛的方法。该软件包实现了四种不同的方法，每种方法都针对不同的模型进行了定制，有效地解决了一系列场景。通过模拟以及对颗粒物排放数据的应用证明了这一点。还为每个模型实现了通用的S3方法，以帮助进行总结、可视化和解释。

引用次数: 0

Clustering of recurrent events data. 重复事件数据的聚类。

IF 1.1 4区数学 Q2 STATISTICS & PROBABILITY

Journal of Applied Statistics

Pub Date : 2025-01-28 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2452966

G Babykina, V Vandewalle, J Carretero-Bravo

Nowadays data are often timestamped, thus, when analysing the events which may occur several times (recurrent events), it is desirable to model the whole dynamics of the counting process rather than to focus on a total number of events. Such kind of data can be encountered in hospital readmissions, disease recurrences or repeated failures of industrial systems. Recurrent events can be analysed in the counting process framework, as in the Andersen-Gill model, assuming that the baseline intensity depends on time and on covariates, as in the Cox model. However, observed covariates are often insufficient to explain the observed heterogeneity in the data. We propose a mixture model for recurrent events, allowing to account for the unobserved heterogeneity and to perform clustering of individuals (unsupervised classification allowing to partition of the heterogeneous data according to unobserved, or latent, variables). Within each cluster, the recurrent event process intensity is specified parametrically and is adjusted for covariates. Model parameters are estimated by maximum likelihood using the EM algorithm; the BIC criterion is adopted to choose an optimal number of clusters. The model feasibility is checked on simulated data. Real data on hospital readmissions of elderly people, which motivated the development of the proposed clustering model, are analysed. The obtained results allow a fine understanding of the recurrent event process in each cluster.

现在的数据通常带有时间戳，因此，在分析可能发生多次的事件（循环事件）时，最好对计数过程的整个动态建模，而不是关注事件的总数。这类数据可能在医院再入院、疾病复发或工业系统的反复故障中遇到。重复事件可以在计数过程框架中进行分析，如在Andersen-Gill模型中，假设基线强度取决于时间和协变量，如在Cox模型中。然而，观察到的协变量往往不足以解释数据中观察到的异质性。我们提出了一个循环事件的混合模型，允许考虑未观察到的异质性，并对个体进行聚类（允许根据未观察到的或潜在的变量划分异构数据的无监督分类）。在每个集群中，循环事件过程强度被参数化地指定，并根据协变量进行调整。采用最大似然算法对模型参数进行估计；采用BIC准则选择最优簇数。仿真数据验证了模型的可行性。对老年人再入院的真实数据进行了分析，这些数据推动了所提出的聚类模型的发展。得到的结果可以很好地理解每个集群中的循环事件过程。

{"title":"Clustering of recurrent events data.","authors":"G Babykina, V Vandewalle, J Carretero-Bravo","doi":"10.1080/02664763.2025.2452966","DOIUrl":"10.1080/02664763.2025.2452966","url":null,"abstract":"Nowadays data are often timestamped, thus, when analysing the events which may occur several times (recurrent events), it is desirable to model the whole dynamics of the counting process rather than to focus on a total number of events. Such kind of data can be encountered in hospital readmissions, disease recurrences or repeated failures of industrial systems. Recurrent events can be analysed in the counting process framework, as in the Andersen-Gill model, assuming that the baseline intensity depends on time and on covariates, as in the Cox model. However, observed covariates are often insufficient to explain the observed heterogeneity in the data. We propose a mixture model for recurrent events, allowing to account for the unobserved heterogeneity and to perform clustering of individuals (unsupervised classification allowing to partition of the heterogeneous data according to unobserved, or latent, variables). Within each cluster, the recurrent event process intensity is specified parametrically and is adjusted for covariates. Model parameters are estimated by maximum likelihood using the EM algorithm; the BIC criterion is adopted to choose an optimal number of clusters. The model feasibility is checked on simulated data. Real data on hospital readmissions of elderly people, which motivated the development of the proposed clustering model, are analysed. The obtained results allow a fine understanding of the recurrent event process in each cluster.","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 11","pages":"2031-2059"},"PeriodicalIF":1.1,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12404095/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144992763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Upper quantile-based CUSUM-type control chart for detecting small changes in image data. 基于上分位数的cusum型控制图，用于检测图像数据的微小变化。

IF 1.1 4区数学 Q2 STATISTICS & PROBABILITY

Journal of Applied Statistics

Pub Date : 2025-01-27 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2456614

Anik Roy, Partha Sarathi Mukherjee

Image monitoring is an important research problem that has wide applications in various fields, including manufacturing industries, satellite imaging, medical diagnostics, and so forth. Traditional image monitoring control charts perform rather poorly when the changes occur at very small regions of the image, and when the changes of image intensity values are small in those regions. Their performances get worse if the images contain noise, and the changes occur near the edges of image objects. In applications such as manufacturing industries, the changes in the images are often too small to be detected by human eyes. In this article, we propose a CUSUM-type control chart for online monitoring of grayscale images. Depending on what kind of changes we wish to detect, big or small, we propose to use a certain upper quantile of the local CUSUM statistics. We incorporate a state-of-the-art jump preserving image smoothing technique in the proposed chart that ensures good performance even in presence of low to moderate noise. Theoretical justifications, and superior performance in numerical comparisons ensure that the proposed control chart can be useful to many researchers and practitioners.

图像监控是一个重要的研究问题，在制造业、卫星成像、医疗诊断等各个领域都有广泛的应用。传统的图像监控控制图在图像很小的区域发生变化，并且这些区域的图像强度值变化很小的情况下，性能很差。如果图像中含有噪声，并且变化发生在图像对象的边缘附近，则其性能会变差。在制造业等应用中，图像的变化通常太小，人眼无法检测到。本文提出了一种用于灰度图像在线监测的cusum型控制图。根据我们希望检测的变化类型（大或小），我们建议使用本地CUSUM统计数据的某个上分位数。我们在提出的图表中结合了最先进的跳跃保持图像平滑技术，即使在低到中等噪声的存在下也能确保良好的性能。理论证明和数值比较的优越性能确保所提出的控制图对许多研究人员和实践者有用。

{"title":"Upper quantile-based CUSUM-type control chart for detecting small changes in image data.","authors":"Anik Roy, Partha Sarathi Mukherjee","doi":"10.1080/02664763.2025.2456614","DOIUrl":"10.1080/02664763.2025.2456614","url":null,"abstract":"Image monitoring is an important research problem that has wide applications in various fields, including manufacturing industries, satellite imaging, medical diagnostics, and so forth. Traditional image monitoring control charts perform rather poorly when the changes occur at very small regions of the image, and when the changes of image intensity values are small in those regions. Their performances get worse if the images contain noise, and the changes occur near the edges of image objects. In applications such as manufacturing industries, the changes in the images are often too small to be detected by human eyes. In this article, we propose a CUSUM-type control chart for online monitoring of grayscale images. Depending on what kind of changes we wish to detect, big or small, we propose to use a certain upper quantile of the local CUSUM statistics. We incorporate a state-of-the-art jump preserving image smoothing technique in the proposed chart that ensures good performance even in presence of low to moderate noise. Theoretical justifications, and superior performance in numerical comparisons ensure that the proposed control chart can be useful to many researchers and practitioners.","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 11","pages":"2156-2171"},"PeriodicalIF":1.1,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12404064/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144992685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Derivation of a multivariate longitudinal causal effects model. 多元纵向因果效应模型的推导。

IF 1.1 4区数学 Q2 STATISTICS & PROBABILITY

Journal of Applied Statistics

Pub Date : 2025-01-24 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2457013

Halima S Twabi, Samuel O M Manda, Dylan S Small, Hans-Peter Kohler

This paper presents a causal inference estimation method for longitudinal observational studies with multiple outcomes. The method uses marginal structural models with inverse probability treatment weights (MSM-IPTWs). In developing the proposed method, we re-define the weights as a product of inverse weights at each time point, accounting for time-varying confounders and treatment exposures and possible correlation between and within (serial) the multiple outcomes. The proposed method is evaluated by simulation studies and with an application to estimate the effect of HIV positivity awareness on condom use and multiple sexual partners using the Malawi Longitudinal Study of Families and Health (MLSFH) data. The simulation study shows that the joint MSM-IPTW performs well with coverage within the expected 95% level for a large sample size (n = 1000) and moderate to strong between and within outcome correlation strength ( $ρ_{j} = 0.3$ , 0.75, $ρ_{k} = 0.4$ , 0.8) when the effects are similar. The joint MSM-IPTW performed relatively the same as the adjusted standard joint model when the treatment effect estimate was the same for the outcomes. In the application, HIV positivity awareness increased the usage of condoms and did not affect the number of sexual partners. We recommend using the proposed MSM-IPTWs to correctly control for time-varying treatment and confounders when estimating causal effects for longitudinal observational studies with multiple outcomes.

本文提出了一种多结果纵向观察研究的因果推理估计方法。该方法使用具有逆概率处理权的边际结构模型（MSM-IPTWs）。在开发所提出的方法时，我们将权重重新定义为每个时间点的逆权重乘积，考虑时变混杂因素和治疗暴露以及（串行）多个结果之间和内部可能的相关性。通过模拟研究对所提出的方法进行了评估，并利用马拉维家庭与健康纵向研究（MLSFH）的数据，应用于估计艾滋病毒阳性意识对避孕套使用和多个性伴侣的影响。模拟研究表明，在大样本量（n = 1000）下，联合MSM-IPTW的覆盖率在预期的95%水平内，在结果相关强度（ρ j = 0.3, 0.75, ρ k = 0.4, 0.8）范围内表现良好。在治疗效果估计相同的情况下，关节MSM-IPTW与调整后的标准关节模型表现相对相同。在应用程序中，艾滋病毒阳性意识增加了避孕套的使用，并且不影响性伴侣的数量。我们建议在估计具有多个结果的纵向观察性研究的因果效应时，使用拟议的MSM-IPTWs来正确控制时变治疗和混杂因素。

{"title":"Derivation of a multivariate longitudinal causal effects model.","authors":"Halima S Twabi, Samuel O M Manda, Dylan S Small, Hans-Peter Kohler","doi":"10.1080/02664763.2025.2457013","DOIUrl":"10.1080/02664763.2025.2457013","url":null,"abstract":"This paper presents a causal inference estimation method for longitudinal observational studies with multiple outcomes. The method uses marginal structural models with inverse probability treatment weights (MSM-IPTWs). In developing the proposed method, we re-define the weights as a product of inverse weights at each time point, accounting for time-varying confounders and treatment exposures and possible correlation between and within (serial) the multiple outcomes. The proposed method is evaluated by simulation studies and with an application to estimate the effect of HIV positivity awareness on condom use and multiple sexual partners using the Malawi Longitudinal Study of Families and Health (MLSFH) data. The simulation study shows that the joint MSM-IPTW performs well with coverage within the expected 95% level for a large sample size (n = 1000) and moderate to strong between and within outcome correlation strength ( <math><msub><mi>ρ</mi> <mi>j</mi></msub> <mo>=</mo> <mn>0.3</mn></math> , 0.75, <math><msub><mi>ρ</mi> <mi>k</mi></msub> <mo>=</mo> <mn>0.4</mn></math> , 0.8) when the effects are similar. The joint MSM-IPTW performed relatively the same as the adjusted standard joint model when the treatment effect estimate was the same for the outcomes. In the application, HIV positivity awareness increased the usage of condoms and did not affect the number of sexual partners. We recommend using the proposed MSM-IPTWs to correctly control for time-varying treatment and confounders when estimating causal effects for longitudinal observational studies with multiple outcomes.","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 12","pages":"2207-2225"},"PeriodicalIF":1.1,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416008/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145029929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Causal effect estimation for competing risk data in randomized trial: adjusting covariates to gain efficiency. 随机试验中竞争风险数据的因果效应估计：调整协变量以获得效率。

IF 1.1 4区数学 Q2 STATISTICS & PROBABILITY

Journal of Applied Statistics

Pub Date : 2025-01-24 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2455626

Youngjoo Cho, Cheng Zheng, Lihong Qi, Ross L Prentice, Mei-Jie Zhang

The double-blinded randomized trial is considered the gold standard to estimate the average causal effect (ACE). The naive estimator without adjusting any covariate is consistent. However, incorporating the covariates that are strong predictors of the outcome could reduce the issue of unbalanced covariate distribution between the treated and controlled groups and can improve efficiency. Recent work has shown that thanks to randomization, for linear regression, an estimator under risk consistency (e.g. Random Forest) for the regression coefficients could maintain the convergence rate even when a nonparametric model is assumed for the effect of covariates. Also, such an adjusted estimator will always lead to efficiency gain compared to the naive unadjusted estimator. In this paper, we extend this result to the competing risk data setting and show that under similar assumptions, the augmented inverse probability censoring weighting (AIPCW) based adjusted estimator has the same convergence rate and efficiency gain. Extensive simulations were performed to show the efficiency gain in the finite sample setting. To illustrate our proposed method, we apply it to the Women's Health Initiative (WHI) dietary modification trial studying the effect of a low-fat diet on cardiovascular disease (CVD) related mortality among those who have prior CVD.

双盲随机试验被认为是估计平均因果效应（ACE）的黄金标准。不调整任何协变量的朴素估计量是一致的。然而，纳入强预测结果的协变量可以减少治疗组和对照组之间协变量分布不平衡的问题，并可以提高效率。最近的研究表明，由于随机化，对于线性回归，即使假设非参数模型对协变量的影响，回归系数在风险一致性下的估计器（例如随机森林）也可以保持收敛速度。此外，与未经调整的估计器相比，这种调整后的估计器总是会导致效率提高。本文将这一结果推广到竞争风险数据集，并证明在相似的假设条件下，基于增广逆概率滤波加权（AIPCW）的调整估计量具有相同的收敛速度和效率增益。进行了大量的模拟，以显示有限样本设置下的效率增益。为了说明我们提出的方法，我们将其应用于妇女健康倡议（WHI）饮食调整试验，研究低脂饮食对心血管疾病（CVD）相关死亡率的影响。

{"title":"Causal effect estimation for competing risk data in randomized trial: adjusting covariates to gain efficiency.","authors":"Youngjoo Cho, Cheng Zheng, Lihong Qi, Ross L Prentice, Mei-Jie Zhang","doi":"10.1080/02664763.2025.2455626","DOIUrl":"10.1080/02664763.2025.2455626","url":null,"abstract":"The double-blinded randomized trial is considered the gold standard to estimate the average causal effect (ACE). The naive estimator without adjusting any covariate is consistent. However, incorporating the covariates that are strong predictors of the outcome could reduce the issue of unbalanced covariate distribution between the treated and controlled groups and can improve efficiency. Recent work has shown that thanks to randomization, for linear regression, an estimator under risk consistency (e.g. Random Forest) for the regression coefficients could maintain the convergence rate even when a nonparametric model is assumed for the effect of covariates. Also, such an adjusted estimator will always lead to efficiency gain compared to the naive unadjusted estimator. In this paper, we extend this result to the competing risk data setting and show that under similar assumptions, the augmented inverse probability censoring weighting (AIPCW) based adjusted estimator has the same convergence rate and efficiency gain. Extensive simulations were performed to show the efficiency gain in the finite sample setting. To illustrate our proposed method, we apply it to the Women's Health Initiative (WHI) dietary modification trial studying the effect of a low-fat diet on cardiovascular disease (CVD) related mortality among those who have prior CVD.","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 11","pages":"2094-2112"},"PeriodicalIF":1.1,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12404078/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144992709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Integrative rank-based regression for multi-source high-dimensional data with multi-type responses. 多源高维数据多类型响应的综合秩回归。

IF 1.1 4区数学 Q2 STATISTICS & PROBABILITY

Journal of Applied Statistics

Pub Date : 2025-01-16 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2452964

Fuzhi Xu, Shuangge Ma, Qingzhao Zhang

Practical scenarios often present instances where the types of responses are different between multi-source different datasets, reflecting distinct attributes or characteristics. In this paper, an integrative rank-based regression is proposed to facilitate information sharing among varied datasets with multi-type responses. Taking advantage of the rank-based regression, our proposed approach adeptly tackles differences in the magnitude of loss functions. In addition, it can robustly handle outliers and data contamination, and effectively mitigate model misspecification. Extensive numerical simulations demonstrate the superior and competitive performance of the proposed approach in model estimation and variable selection. Analysis of genetic data on HNSC and LUAD yields results with biological explanations and confirms its practical usefulness.

在实际场景中，多源不同数据集之间的响应类型不同，反映了不同的属性或特征。本文提出了一种基于秩的综合回归方法，以促进多类型响应的不同数据集之间的信息共享。利用基于秩的回归，我们提出的方法巧妙地处理了损失函数大小的差异。此外，它还可以鲁棒地处理异常值和数据污染，并有效地减轻模型错误规范。大量的数值仿真证明了该方法在模型估计和变量选择方面的优越性和竞争力。对HNSC和LUAD基因数据的分析得出了具有生物学解释的结果，并证实了其实际用途。

引用次数: 0