首页 > 最新文献

Journal of Applied Statistics最新文献

英文 中文
Gene mutation estimations via mutual information and Ewens sampling based CNN & machine learning algorithms. 基于互信息和埃文斯采样的CNN和机器学习算法的基因突变估计。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-02-03 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2460076
Wanyang Dai

We conduct gene mutation rate estimations via developing mutual information and Ewens sampling based convolutional neural network (CNN) and machine learning algorithms. More precisely, we develop a systematic methodology through constructing a CNN. Meanwhile, we develop two machine learning algorithms to study protein production with target gene sequences and protein structures. The core of the CNN and machine learning approach is to address a two-stage optimization problem to balance gene mutation rates during protein production. To wit, we try to optimally coordinate the consistency between the given input DNA sequences and the given (or optimally computed) target ones through controlling their intermediate gene mutation rates. The purposes in doing so are aimed to conduct gene editing and protein structure prediction. For example, after the gene mutation rates are estimated, the computing complexity of protein structure prediction will be reduced to a reasonable degree. Our developed CNN numerical optimization scheme consists of two newly designed machine learning algorithms. The stochastic gradients for the two algorithms are designed according to the Kuhn-Tucker conditions with boundary constraints and with the support of Ewens sampling, multi-input multi-output (MIMO) mutual information, and codon optimization techniques. The associated learning rate bounds are explicitly derived from the method and the two algorithms are numerically implemented. The convergence and optimality of the algorithms are mathematically proved. To illustrate the usage of our study, we also conduct a real-world data implementation.

我们通过开发基于互信息和埃文斯采样的卷积神经网络(CNN)和机器学习算法来进行基因突变率估计。更准确地说,我们通过构建CNN开发了一种系统的方法。同时,我们开发了两种机器学习算法来研究目标基因序列和蛋白质结构的蛋白质产生。CNN和机器学习方法的核心是解决一个两阶段优化问题,以平衡蛋白质生产过程中的基因突变率。也就是说,我们试图通过控制它们的中间基因突变率来优化协调给定的输入DNA序列和给定的(或优化计算的)目标序列之间的一致性。这样做的目的是为了进行基因编辑和蛋白质结构预测。例如,在估计基因突变率后,将蛋白质结构预测的计算复杂度降低到合理的程度。我们开发的CNN数值优化方案由两种新设计的机器学习算法组成。基于边界约束的Kuhn-Tucker条件,采用evens采样、多输入多输出互信息和密码子优化技术,设计了两种算法的随机梯度。该方法明确地推导了相关的学习率界限,并对两种算法进行了数值实现。数学上证明了算法的收敛性和最优性。为了说明我们研究的用法,我们还进行了一个真实世界的数据实现。
{"title":"Gene mutation estimations via mutual information and Ewens sampling based CNN & machine learning algorithms.","authors":"Wanyang Dai","doi":"10.1080/02664763.2025.2460076","DOIUrl":"https://doi.org/10.1080/02664763.2025.2460076","url":null,"abstract":"<p><p>We conduct gene mutation rate estimations via developing mutual information and Ewens sampling based convolutional neural network (CNN) and machine learning algorithms. More precisely, we develop a systematic methodology through constructing a CNN. Meanwhile, we develop two machine learning algorithms to study protein production with target gene sequences and protein structures. The core of the CNN and machine learning approach is to address a two-stage optimization problem to balance gene mutation rates during protein production. To wit, we try to optimally coordinate the consistency between the given input DNA sequences and the given (or optimally computed) target ones through controlling their intermediate gene mutation rates. The purposes in doing so are aimed to conduct gene editing and protein structure prediction. For example, after the gene mutation rates are estimated, the computing complexity of protein structure prediction will be reduced to a reasonable degree. Our developed CNN numerical optimization scheme consists of two newly designed machine learning algorithms. The stochastic gradients for the two algorithms are designed according to the Kuhn-Tucker conditions with boundary constraints and with the support of Ewens sampling, multi-input multi-output (MIMO) mutual information, and codon optimization techniques. The associated learning rate bounds are explicitly derived from the method and the two algorithms are numerically implemented. The convergence and optimality of the algorithms are mathematically proved. To illustrate the usage of our study, we also conduct a real-world data implementation.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 12","pages":"2321-2353"},"PeriodicalIF":1.1,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416021/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145029916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Change point detection to analyze air pollution and its economic effects: an exponentially weighted moving average perspective. 变化点检测分析空气污染及其经济影响:指数加权移动平均视角。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-02-02 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2455636
Shabbir Ahmad, Muhammad Riaz, Tahir Mahmood, Nasir Abbas

Air pollution has a direct impact on every society, leading to consequential effects on the economy of a nation. Poor air quality adversely affects human health, resulting in various economic outcomes such as rising healthcare costs, diminished labor productivity, negative impacts on tourism and living standards, increased regulatory expenses for businesses, and heightened economic disparities. Effective control methods are essential to monitor factors influencing the economy, including air quality. The presence of toxic substances in the air reduces air quality, necessitating its monitoring through indices like PM10. Among statistical process control tools, control charts are the most prominent for efficient change point detection. This study introduces a new process monitoring tool that incorporates additional auxiliary information, if available, alongside the main variable of interest. The proposed methodology ensures detection ability remains robust, even under disturbances in the auxiliary variable. Furthermore, mathematical analyses reveal that many existing statistical quality control tools become special cases of the proposed structure for specific sensitivity parameter values. Evaluated through properties of run length distribution, the proposed chart allows control of the robustness-efficiency balance by adjusting its sensitivity parameter. A practical implementation demonstrates the effectiveness of the chart in monitoring air quality data.

空气污染对每个社会都有直接的影响,并对一个国家的经济产生相应的影响。空气质量差对人类健康产生不利影响,导致各种经济后果,如医疗保健成本上升、劳动生产率下降、对旅游业和生活水平产生负面影响、企业监管费用增加以及经济差距加剧。有效的控制方法对于监测包括空气质量在内的影响经济的因素至关重要。空气中有毒物质的存在降低了空气质量,因此有必要通过PM10等指数进行监测。在统计过程控制工具中,控制图是最突出的有效的变化点检测工具。本研究引入了一种新的过程监控工具,如果可用,它将附加的辅助信息与感兴趣的主要变量结合在一起。所提出的方法确保检测能力保持鲁棒性,即使在辅助变量的干扰下。此外,数学分析表明,对于特定的灵敏度参数值,许多现有的统计质量控制工具成为所提出结构的特殊情况。通过对运行长度分布特性的评估,所提出的图可以通过调整其灵敏度参数来控制鲁棒性与效率的平衡。实际应用表明,该图表在监测空气质量数据方面是有效的。
{"title":"Change point detection to analyze air pollution and its economic effects: an exponentially weighted moving average perspective.","authors":"Shabbir Ahmad, Muhammad Riaz, Tahir Mahmood, Nasir Abbas","doi":"10.1080/02664763.2025.2455636","DOIUrl":"10.1080/02664763.2025.2455636","url":null,"abstract":"<p><p>Air pollution has a direct impact on every society, leading to consequential effects on the economy of a nation. Poor air quality adversely affects human health, resulting in various economic outcomes such as rising healthcare costs, diminished labor productivity, negative impacts on tourism and living standards, increased regulatory expenses for businesses, and heightened economic disparities. Effective control methods are essential to monitor factors influencing the economy, including air quality. The presence of toxic substances in the air reduces air quality, necessitating its monitoring through indices like PM10. Among statistical process control tools, control charts are the most prominent for efficient change point detection. This study introduces a new process monitoring tool that incorporates additional auxiliary information, if available, alongside the main variable of interest. The proposed methodology ensures detection ability remains robust, even under disturbances in the auxiliary variable. Furthermore, mathematical analyses reveal that many existing statistical quality control tools become special cases of the proposed structure for specific sensitivity parameter values. Evaluated through properties of run length distribution, the proposed chart allows control of the robustness-efficiency balance by adjusting its sensitivity parameter. A practical implementation demonstrates the effectiveness of the chart in monitoring air quality data.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 11","pages":"2113-2155"},"PeriodicalIF":1.1,"publicationDate":"2025-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12404093/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144992776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the use and misuse of time-rescaling to assess the goodness-of-fit of self-exciting temporal point processes. 自激时间点过程拟合优度评估的时间重标化方法。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-02-02 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2459245
M-A El-Aroui

The paper first highlights important drawbacks and biases related to the common use of time-rescaling to assess the goodness-of-fit (Gof) of self-exciting temporal point process (SETPP) models. Then it presents a new predictive time-rescaling approach leading to an asymptotically unbiased Gof framework for general SETPPs in the case of single observed trajectories. The predictive approach focuses on forecasting accuracy and addresses the bias problem resulting from the plugged-in estimated parameters. Dawid's prequential approach is used and the models' checking is mainly based on the forecasting accuracy of arrival times. These times are transformed, using sequentially estimated parameters, into random vectors which are proved to converge in probability under the null hypothesis and standard regulatory conditions to vectors of iid Exponential(1) rv's. Numerical experiments are used to compare the performances of the standard and predictive time-rescaling for Gof assessment of non-homogeneous Poisson and Hawkes self-exciting temporal processes. Data of Japanese seismic events are also used to illustrate the dynamic aspect of the proposed model-checking approach.

本文首先强调了通常使用时间重标来评估自激时间点过程(SETPP)模型的拟合优度(Gof)的重要缺陷和偏差。然后,提出了一种新的预测时间重标方法,从而得到了单观测轨迹情况下一般setp的渐近无偏Gof框架。预测方法侧重于预测精度,并解决了由插入估计参数引起的偏差问题。采用david的先验方法,对模型的检验主要基于到达时间的预测精度。使用顺序估计的参数将这些时间转换成随机向量,证明在零假设和标准调节条件下,这些随机向量在概率上收敛于iid指数(1)rv的向量。通过数值实验比较了非齐次Poisson和Hawkes自激时间过程的标准时间尺度和预测时间尺度对Gof评价的性能。日本地震事件的数据也被用来说明所提出的模型检验方法的动态方面。
{"title":"On the use and misuse of time-rescaling to assess the goodness-of-fit of self-exciting temporal point processes.","authors":"M-A El-Aroui","doi":"10.1080/02664763.2025.2459245","DOIUrl":"https://doi.org/10.1080/02664763.2025.2459245","url":null,"abstract":"<p><p>The paper first highlights important drawbacks and biases related to the common use of time-rescaling to assess the goodness-of-fit (Gof) of self-exciting temporal point process (SETPP) models. Then it presents a new predictive time-rescaling approach leading to an asymptotically unbiased Gof framework for general SETPPs in the case of single observed trajectories. The predictive approach focuses on forecasting accuracy and addresses the bias problem resulting from the plugged-in estimated parameters. Dawid's prequential approach is used and the models' checking is mainly based on the forecasting accuracy of arrival times. These times are transformed, using sequentially estimated parameters, into random vectors which are proved to converge in probability under the null hypothesis and standard regulatory conditions to vectors of iid Exponential(1) rv's. Numerical experiments are used to compare the performances of the standard and predictive time-rescaling for Gof assessment of non-homogeneous Poisson and Hawkes self-exciting temporal processes. Data of Japanese seismic events are also used to illustrate the dynamic aspect of the proposed model-checking approach.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 12","pages":"2247-2270"},"PeriodicalIF":1.1,"publicationDate":"2025-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416029/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145029909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gradient test to assess homogeneity of probabilities in discrete-time transition models with application in agricultural science data. 在农业科学数据中应用梯度检验评估离散时间过渡模型的概率同质性。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-02-02 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2457008
Laura Vicuña Torres de Paula, Idemauro Antonio Rodrigues de Lara, Cesar Auguto Taconeli, Carolina Reigada, Rafael de Andrade Moral

Longitudinal studies in discrete or continuous time involving categorical data are common in agricultural sciences. Transition models can be used as a means to analyse the resulting data, especially when the aim is to describe category changes over time, as well as to accommodate covariates due to experimental design. Here we focus on discrete-time models, for which it is critical to assess whether the underlying process is stationary or not. Tests based on likelihood procedures are very useful, and here we propose the Gradient test to assess stationary, or homogeneity of transition probabilities. We carried out simulation studies to evaluate the performance of the proposed test, which indicated a good performance regarding type-I error and power when compared to other classical tests available in the literature. As motivation we present two studies with agricultural data, the first one applied to entomology with nominal responses and the second application refers to the degree of injury in pigs. Using our proposed test, stationarity and non-stationarity were verified respectively in the applications. Since the gradient test to assess stationarity has a simplified structure when compared to other tests, it is therefore a useful alternative when carrying out inference in these types of models.

涉及分类数据的离散或连续时间的纵向研究在农业科学中很常见。过渡模型可以用作分析结果数据的一种手段,特别是当目的是描述随时间的类别变化时,以及由于实验设计而容纳协变量。在这里,我们将重点放在离散时间模型上,对于离散时间模型来说,评估潜在过程是否平稳至关重要。基于似然程序的测试非常有用,在这里我们提出梯度测试来评估转移概率的平稳性或同质性。我们进行了仿真研究来评估所提出的测试的性能,与文献中可用的其他经典测试相比,该测试在i型误差和功率方面表现良好。作为动机,我们提出了两项农业数据研究,第一个应用于昆虫学,具有名义响应,第二个应用涉及猪的伤害程度。利用本文提出的测试方法,分别在应用中验证了平稳性和非平稳性。由于与其他测试相比,评估平稳性的梯度测试具有简化的结构,因此在这些类型的模型中进行推理时,它是一个有用的替代方案。
{"title":"Gradient test to assess homogeneity of probabilities in discrete-time transition models with application in agricultural science data.","authors":"Laura Vicuña Torres de Paula, Idemauro Antonio Rodrigues de Lara, Cesar Auguto Taconeli, Carolina Reigada, Rafael de Andrade Moral","doi":"10.1080/02664763.2025.2457008","DOIUrl":"10.1080/02664763.2025.2457008","url":null,"abstract":"<p><p>Longitudinal studies in discrete or continuous time involving categorical data are common in agricultural sciences. Transition models can be used as a means to analyse the resulting data, especially when the aim is to describe category changes over time, as well as to accommodate covariates due to experimental design. Here we focus on discrete-time models, for which it is critical to assess whether the underlying process is stationary or not. Tests based on likelihood procedures are very useful, and here we propose the Gradient test to assess stationary, or homogeneity of transition probabilities. We carried out simulation studies to evaluate the performance of the proposed test, which indicated a good performance regarding type-I error and power when compared to other classical tests available in the literature. As motivation we present two studies with agricultural data, the first one applied to entomology with nominal responses and the second application refers to the degree of injury in pigs. Using our proposed test, stationarity and non-stationarity were verified respectively in the applications. Since the gradient test to assess stationarity has a simplified structure when compared to other tests, it is therefore a useful alternative when carrying out inference in these types of models.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 11","pages":"2172-2190"},"PeriodicalIF":1.1,"publicationDate":"2025-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12404091/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144992721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pathway-based genetic association analysis for overdispersed count data. 过度分散计数数据的通路遗传关联分析。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-02-02 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2460073
Yang Liu

Overdispersion is a common phenomenon in genetic data, such as gene expression count data. In genetic association studies, it is important to investigate the association between a gene expression and a set of genetic variants from a pathway. However, existing approaches for pathway analysis are primarily designed for continuous and binary outcomes and are not applicable to overdispersed count data. In this paper, we propose a hierarchical approach to analyze the association between an overdispersed count response and a set of low-frequency genetic variants in negative binomial regression. We derive score-type test statistics for both fixed and random effects of genetic variants, and further introduce a novel procedure for efficiently combining these two statistics for global testing. Through simulation studies, we demonstrate that the proposed method tends to be more powerful than existing methods under a wide range of scenarios. Additionally, we apply the proposed method to a colorectal cancer study, demonstrating its power in identifying associations between gene expression and somatic mutations.

过度分散是基因数据中常见的现象,如基因表达计数数据。在遗传关联研究中,重要的是研究基因表达与一组来自途径的遗传变异之间的关系。然而,现有的途径分析方法主要是为连续和二元结果设计的,不适用于过度分散的计数数据。在本文中,我们提出了一种分层方法来分析负二项回归中过度分散计数响应与一组低频遗传变异之间的关系。我们推导了遗传变异的固定效应和随机效应的得分型测试统计,并进一步引入了一种有效地将这两种统计结合起来进行全局测试的新方法。通过仿真研究,我们证明了该方法在广泛的场景下比现有方法更强大。此外,我们将提出的方法应用于结直肠癌研究,证明其在识别基因表达和体细胞突变之间的关联方面的能力。
{"title":"Pathway-based genetic association analysis for overdispersed count data.","authors":"Yang Liu","doi":"10.1080/02664763.2025.2460073","DOIUrl":"https://doi.org/10.1080/02664763.2025.2460073","url":null,"abstract":"<p><p>Overdispersion is a common phenomenon in genetic data, such as gene expression count data. In genetic association studies, it is important to investigate the association between a gene expression and a set of genetic variants from a pathway. However, existing approaches for pathway analysis are primarily designed for continuous and binary outcomes and are not applicable to overdispersed count data. In this paper, we propose a hierarchical approach to analyze the association between an overdispersed count response and a set of low-frequency genetic variants in negative binomial regression. We derive score-type test statistics for both fixed and random effects of genetic variants, and further introduce a novel procedure for efficiently combining these two statistics for global testing. Through simulation studies, we demonstrate that the proposed method tends to be more powerful than existing methods under a wide range of scenarios. Additionally, we apply the proposed method to a colorectal cancer study, demonstrating its power in identifying associations between gene expression and somatic mutations.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 12","pages":"2306-2320"},"PeriodicalIF":1.1,"publicationDate":"2025-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416034/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145029923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Penalized functional regression using R package PFLR. 惩罚函数回归使用R包PFLR。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-01-28 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2457011
Rob Cameron, Tianyu Guan, Haolun Shi, Zhenhua Lin

Penalized functional regression is a useful tool to estimate models for applications where the effect/coefficient function is assumed to be truncated. The truncated coefficient function occurs when the functional predictor does not influence the response after a certain cutoff point on the time domain. The R package PFLR offers an extensive suite of methods for advanced functional regression techniques with penalization. The package implements four distinct methods, each tailored to different models, effectively addressing a range of scenarios. This is demonstrated through simulations as well as an application to particulate matter emissions data. Generic S3 methods are also implemented for each model to help with summary, visualization and interpretation.

对于假设效应/系数函数被截断的应用程序,惩罚函数回归是估计模型的有用工具。当函数预测器在时域上的某个截止点之后不影响响应时,就会出现截断系数函数。R包PFLR为高级函数回归技术提供了一套广泛的方法。该软件包实现了四种不同的方法,每种方法都针对不同的模型进行了定制,有效地解决了一系列场景。通过模拟以及对颗粒物排放数据的应用证明了这一点。还为每个模型实现了通用的S3方法,以帮助进行总结、可视化和解释。
{"title":"Penalized functional regression using R package PFLR.","authors":"Rob Cameron, Tianyu Guan, Haolun Shi, Zhenhua Lin","doi":"10.1080/02664763.2025.2457011","DOIUrl":"10.1080/02664763.2025.2457011","url":null,"abstract":"<p><p>Penalized functional regression is a useful tool to estimate models for applications where the effect/coefficient function is assumed to be truncated. The truncated coefficient function occurs when the functional predictor does not influence the response after a certain cutoff point on the time domain. The R package <b>PFLR</b> offers an extensive suite of methods for advanced functional regression techniques with penalization. The package implements four distinct methods, each tailored to different models, effectively addressing a range of scenarios. This is demonstrated through simulations as well as an application to particulate matter emissions data. Generic S3 methods are also implemented for each model to help with summary, visualization and interpretation.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 11","pages":"2191-2205"},"PeriodicalIF":1.1,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12424440/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145064707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clustering of recurrent events data. 重复事件数据的聚类。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-01-28 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2452966
G Babykina, V Vandewalle, J Carretero-Bravo

Nowadays data are often timestamped, thus, when analysing the events which may occur several times (recurrent events), it is desirable to model the whole dynamics of the counting process rather than to focus on a total number of events. Such kind of data can be encountered in hospital readmissions, disease recurrences or repeated failures of industrial systems. Recurrent events can be analysed in the counting process framework, as in the Andersen-Gill model, assuming that the baseline intensity depends on time and on covariates, as in the Cox model. However, observed covariates are often insufficient to explain the observed heterogeneity in the data. We propose a mixture model for recurrent events, allowing to account for the unobserved heterogeneity and to perform clustering of individuals (unsupervised classification allowing to partition of the heterogeneous data according to unobserved, or latent, variables). Within each cluster, the recurrent event process intensity is specified parametrically and is adjusted for covariates. Model parameters are estimated by maximum likelihood using the EM algorithm; the BIC criterion is adopted to choose an optimal number of clusters. The model feasibility is checked on simulated data. Real data on hospital readmissions of elderly people, which motivated the development of the proposed clustering model, are analysed. The obtained results allow a fine understanding of the recurrent event process in each cluster.

现在的数据通常带有时间戳,因此,在分析可能发生多次的事件(循环事件)时,最好对计数过程的整个动态建模,而不是关注事件的总数。这类数据可能在医院再入院、疾病复发或工业系统的反复故障中遇到。重复事件可以在计数过程框架中进行分析,如在Andersen-Gill模型中,假设基线强度取决于时间和协变量,如在Cox模型中。然而,观察到的协变量往往不足以解释数据中观察到的异质性。我们提出了一个循环事件的混合模型,允许考虑未观察到的异质性,并对个体进行聚类(允许根据未观察到的或潜在的变量划分异构数据的无监督分类)。在每个集群中,循环事件过程强度被参数化地指定,并根据协变量进行调整。采用最大似然算法对模型参数进行估计;采用BIC准则选择最优簇数。仿真数据验证了模型的可行性。对老年人再入院的真实数据进行了分析,这些数据推动了所提出的聚类模型的发展。得到的结果可以很好地理解每个集群中的循环事件过程。
{"title":"Clustering of recurrent events data.","authors":"G Babykina, V Vandewalle, J Carretero-Bravo","doi":"10.1080/02664763.2025.2452966","DOIUrl":"10.1080/02664763.2025.2452966","url":null,"abstract":"<p><p>Nowadays data are often timestamped, thus, when analysing the events which may occur several times (recurrent events), it is desirable to model the whole dynamics of the counting process rather than to focus on a total number of events. Such kind of data can be encountered in hospital readmissions, disease recurrences or repeated failures of industrial systems. Recurrent events can be analysed in the counting process framework, as in the Andersen-Gill model, assuming that the baseline intensity depends on time and on covariates, as in the Cox model. However, observed covariates are often insufficient to explain the observed heterogeneity in the data. We propose a mixture model for recurrent events, allowing to account for the unobserved heterogeneity and to perform clustering of individuals (unsupervised classification allowing to partition of the heterogeneous data according to unobserved, or latent, variables). Within each cluster, the recurrent event process intensity is specified parametrically and is adjusted for covariates. Model parameters are estimated by maximum likelihood using the EM algorithm; the BIC criterion is adopted to choose an optimal number of clusters. The model feasibility is checked on simulated data. Real data on hospital readmissions of elderly people, which motivated the development of the proposed clustering model, are analysed. The obtained results allow a fine understanding of the recurrent event process in each cluster.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 11","pages":"2031-2059"},"PeriodicalIF":1.1,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12404095/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144992763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Upper quantile-based CUSUM-type control chart for detecting small changes in image data. 基于上分位数的cusum型控制图,用于检测图像数据的微小变化。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-01-27 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2456614
Anik Roy, Partha Sarathi Mukherjee

Image monitoring is an important research problem that has wide applications in various fields, including manufacturing industries, satellite imaging, medical diagnostics, and so forth. Traditional image monitoring control charts perform rather poorly when the changes occur at very small regions of the image, and when the changes of image intensity values are small in those regions. Their performances get worse if the images contain noise, and the changes occur near the edges of image objects. In applications such as manufacturing industries, the changes in the images are often too small to be detected by human eyes. In this article, we propose a CUSUM-type control chart for online monitoring of grayscale images. Depending on what kind of changes we wish to detect, big or small, we propose to use a certain upper quantile of the local CUSUM statistics. We incorporate a state-of-the-art jump preserving image smoothing technique in the proposed chart that ensures good performance even in presence of low to moderate noise. Theoretical justifications, and superior performance in numerical comparisons ensure that the proposed control chart can be useful to many researchers and practitioners.

图像监控是一个重要的研究问题,在制造业、卫星成像、医疗诊断等各个领域都有广泛的应用。传统的图像监控控制图在图像很小的区域发生变化,并且这些区域的图像强度值变化很小的情况下,性能很差。如果图像中含有噪声,并且变化发生在图像对象的边缘附近,则其性能会变差。在制造业等应用中,图像的变化通常太小,人眼无法检测到。本文提出了一种用于灰度图像在线监测的cusum型控制图。根据我们希望检测的变化类型(大或小),我们建议使用本地CUSUM统计数据的某个上分位数。我们在提出的图表中结合了最先进的跳跃保持图像平滑技术,即使在低到中等噪声的存在下也能确保良好的性能。理论证明和数值比较的优越性能确保所提出的控制图对许多研究人员和实践者有用。
{"title":"Upper quantile-based CUSUM-type control chart for detecting small changes in image data.","authors":"Anik Roy, Partha Sarathi Mukherjee","doi":"10.1080/02664763.2025.2456614","DOIUrl":"10.1080/02664763.2025.2456614","url":null,"abstract":"<p><p>Image monitoring is an important research problem that has wide applications in various fields, including manufacturing industries, satellite imaging, medical diagnostics, and so forth. Traditional image monitoring control charts perform rather poorly when the changes occur at very small regions of the image, and when the changes of image intensity values are small in those regions. Their performances get worse if the images contain noise, and the changes occur near the edges of image objects. In applications such as manufacturing industries, the changes in the images are often too small to be detected by human eyes. In this article, we propose a CUSUM-type control chart for online monitoring of grayscale images. Depending on what kind of changes we wish to detect, big or small, we propose to use a certain upper quantile of the local CUSUM statistics. We incorporate a state-of-the-art jump preserving image smoothing technique in the proposed chart that ensures good performance even in presence of low to moderate noise. Theoretical justifications, and superior performance in numerical comparisons ensure that the proposed control chart can be useful to many researchers and practitioners.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 11","pages":"2156-2171"},"PeriodicalIF":1.1,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12404064/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144992685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Derivation of a multivariate longitudinal causal effects model. 多元纵向因果效应模型的推导。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-01-24 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2457013
Halima S Twabi, Samuel O M Manda, Dylan S Small, Hans-Peter Kohler

This paper presents a causal inference estimation method for longitudinal observational studies with multiple outcomes. The method uses marginal structural models with inverse probability treatment weights (MSM-IPTWs). In developing the proposed method, we re-define the weights as a product of inverse weights at each time point, accounting for time-varying confounders and treatment exposures and possible correlation between and within (serial) the multiple outcomes. The proposed method is evaluated by simulation studies and with an application to estimate the effect of HIV positivity awareness on condom use and multiple sexual partners using the Malawi Longitudinal Study of Families and Health (MLSFH) data. The simulation study shows that the joint MSM-IPTW performs well with coverage within the expected 95% level for a large sample size (n = 1000) and moderate to strong between and within outcome correlation strength ( ρ j = 0.3 , 0.75, ρ k = 0.4 , 0.8) when the effects are similar. The joint MSM-IPTW performed relatively the same as the adjusted standard joint model when the treatment effect estimate was the same for the outcomes. In the application, HIV positivity awareness increased the usage of condoms and did not affect the number of sexual partners. We recommend using the proposed MSM-IPTWs to correctly control for time-varying treatment and confounders when estimating causal effects for longitudinal observational studies with multiple outcomes.

本文提出了一种多结果纵向观察研究的因果推理估计方法。该方法使用具有逆概率处理权的边际结构模型(MSM-IPTWs)。在开发所提出的方法时,我们将权重重新定义为每个时间点的逆权重乘积,考虑时变混杂因素和治疗暴露以及(串行)多个结果之间和内部可能的相关性。通过模拟研究对所提出的方法进行了评估,并利用马拉维家庭与健康纵向研究(MLSFH)的数据,应用于估计艾滋病毒阳性意识对避孕套使用和多个性伴侣的影响。模拟研究表明,在大样本量(n = 1000)下,联合MSM-IPTW的覆盖率在预期的95%水平内,在结果相关强度(ρ j = 0.3, 0.75, ρ k = 0.4, 0.8)范围内表现良好。在治疗效果估计相同的情况下,关节MSM-IPTW与调整后的标准关节模型表现相对相同。在应用程序中,艾滋病毒阳性意识增加了避孕套的使用,并且不影响性伴侣的数量。我们建议在估计具有多个结果的纵向观察性研究的因果效应时,使用拟议的MSM-IPTWs来正确控制时变治疗和混杂因素。
{"title":"Derivation of a multivariate longitudinal causal effects model.","authors":"Halima S Twabi, Samuel O M Manda, Dylan S Small, Hans-Peter Kohler","doi":"10.1080/02664763.2025.2457013","DOIUrl":"10.1080/02664763.2025.2457013","url":null,"abstract":"<p><p>This paper presents a causal inference estimation method for longitudinal observational studies with multiple outcomes. The method uses marginal structural models with inverse probability treatment weights (MSM-IPTWs). In developing the proposed method, we re-define the weights as a product of inverse weights at each time point, accounting for time-varying confounders and treatment exposures and possible correlation between and within (serial) the multiple outcomes. The proposed method is evaluated by simulation studies and with an application to estimate the effect of HIV positivity awareness on condom use and multiple sexual partners using the Malawi Longitudinal Study of Families and Health (MLSFH) data. The simulation study shows that the joint MSM-IPTW performs well with coverage within the expected 95% level for a large sample size (<i>n</i> = 1000) and moderate to strong between and within outcome correlation strength ( <math><msub><mi>ρ</mi> <mi>j</mi></msub> <mo>=</mo> <mn>0.3</mn></math> , 0.75, <math><msub><mi>ρ</mi> <mi>k</mi></msub> <mo>=</mo> <mn>0.4</mn></math> , 0.8) when the effects are similar. The joint MSM-IPTW performed relatively the same as the adjusted standard joint model when the treatment effect estimate was the same for the outcomes. In the application, HIV positivity awareness increased the usage of condoms and did not affect the number of sexual partners. We recommend using the proposed MSM-IPTWs to correctly control for time-varying treatment and confounders when estimating causal effects for longitudinal observational studies with multiple outcomes.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 12","pages":"2207-2225"},"PeriodicalIF":1.1,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416008/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145029929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causal effect estimation for competing risk data in randomized trial: adjusting covariates to gain efficiency. 随机试验中竞争风险数据的因果效应估计:调整协变量以获得效率。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-01-24 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2455626
Youngjoo Cho, Cheng Zheng, Lihong Qi, Ross L Prentice, Mei-Jie Zhang

The double-blinded randomized trial is considered the gold standard to estimate the average causal effect (ACE). The naive estimator without adjusting any covariate is consistent. However, incorporating the covariates that are strong predictors of the outcome could reduce the issue of unbalanced covariate distribution between the treated and controlled groups and can improve efficiency. Recent work has shown that thanks to randomization, for linear regression, an estimator under risk consistency (e.g. Random Forest) for the regression coefficients could maintain the convergence rate even when a nonparametric model is assumed for the effect of covariates. Also, such an adjusted estimator will always lead to efficiency gain compared to the naive unadjusted estimator. In this paper, we extend this result to the competing risk data setting and show that under similar assumptions, the augmented inverse probability censoring weighting (AIPCW) based adjusted estimator has the same convergence rate and efficiency gain. Extensive simulations were performed to show the efficiency gain in the finite sample setting. To illustrate our proposed method, we apply it to the Women's Health Initiative (WHI) dietary modification trial studying the effect of a low-fat diet on cardiovascular disease (CVD) related mortality among those who have prior CVD.

双盲随机试验被认为是估计平均因果效应(ACE)的黄金标准。不调整任何协变量的朴素估计量是一致的。然而,纳入强预测结果的协变量可以减少治疗组和对照组之间协变量分布不平衡的问题,并可以提高效率。最近的研究表明,由于随机化,对于线性回归,即使假设非参数模型对协变量的影响,回归系数在风险一致性下的估计器(例如随机森林)也可以保持收敛速度。此外,与未经调整的估计器相比,这种调整后的估计器总是会导致效率提高。本文将这一结果推广到竞争风险数据集,并证明在相似的假设条件下,基于增广逆概率滤波加权(AIPCW)的调整估计量具有相同的收敛速度和效率增益。进行了大量的模拟,以显示有限样本设置下的效率增益。为了说明我们提出的方法,我们将其应用于妇女健康倡议(WHI)饮食调整试验,研究低脂饮食对心血管疾病(CVD)相关死亡率的影响。
{"title":"Causal effect estimation for competing risk data in randomized trial: adjusting covariates to gain efficiency.","authors":"Youngjoo Cho, Cheng Zheng, Lihong Qi, Ross L Prentice, Mei-Jie Zhang","doi":"10.1080/02664763.2025.2455626","DOIUrl":"10.1080/02664763.2025.2455626","url":null,"abstract":"<p><p>The double-blinded randomized trial is considered the gold standard to estimate the average causal effect (ACE). The naive estimator without adjusting any covariate is consistent. However, incorporating the covariates that are strong predictors of the outcome could reduce the issue of unbalanced covariate distribution between the treated and controlled groups and can improve efficiency. Recent work has shown that thanks to randomization, for linear regression, an estimator under risk consistency (e.g. Random Forest) for the regression coefficients could maintain the convergence rate even when a nonparametric model is assumed for the effect of covariates. Also, such an adjusted estimator will always lead to efficiency gain compared to the naive unadjusted estimator. In this paper, we extend this result to the competing risk data setting and show that under similar assumptions, the augmented inverse probability censoring weighting (AIPCW) based adjusted estimator has the same convergence rate and efficiency gain. Extensive simulations were performed to show the efficiency gain in the finite sample setting. To illustrate our proposed method, we apply it to the Women's Health Initiative (WHI) dietary modification trial studying the effect of a low-fat diet on cardiovascular disease (CVD) related mortality among those who have prior CVD.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 11","pages":"2094-2112"},"PeriodicalIF":1.1,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12404078/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144992709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Applied Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1