首页 > 最新文献

Journal of Applied Statistics最新文献

英文 中文
Rollout designs for lump-sum data. 一次性数据的推出设计。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-12-13 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2024.2440031
Qunzhi Xu, Hongzhen Tian, Ananda Sarkar, Yajun Mei

This work studies rollout design problems with a focus of suitable choices of rollout rate under the standard Type I and Type II error probabilities control framework. The main challenge of rollout design is that data is often observed in a lump-sum manner from a spatio-temporal point of view: (1) temporally, only the sum of data in a given sliding time window can be observed; (2) spatially, there are two subgroups for the data at each time step: control and treatment, but one can only observe the total values instead of individual values from each subgroup. We develop rollout tests of lump-sum data under both fixed-sample-size and sequential settings, subject to the constraints on Type I and Type II error probabilities. Numerical studies are conducted to validate our theoretical results.

本文研究了在标准的第一类和第二类误差概率控制框架下的部署设计问题,重点研究了部署速率的合适选择。铺展设计的主要挑战是,从时空的角度来看,数据通常是以一次和的方式观察的:(1)在时间上,只能观察给定滑动时间窗口内的数据总和;(2)在空间上,每个时间步长的数据有两个子组:对照组和处理组,但每个子组只能观察到总价值,而不能观察到单个值。我们在固定样本量和顺序设置下开发了一次性数据的推出测试,并受到类型I和类型II错误概率的约束。数值研究验证了我们的理论结果。
{"title":"Rollout designs for lump-sum data.","authors":"Qunzhi Xu, Hongzhen Tian, Ananda Sarkar, Yajun Mei","doi":"10.1080/02664763.2024.2440031","DOIUrl":"10.1080/02664763.2024.2440031","url":null,"abstract":"<p><p>This work studies rollout design problems with a focus of suitable choices of rollout rate under the standard Type I and Type II error probabilities control framework. The main challenge of rollout design is that data is often observed in a lump-sum manner from a spatio-temporal point of view: (1) temporally, only the sum of data in a given sliding time window can be observed; (2) spatially, there are two subgroups for the data at each time step: control and treatment, but one can only observe the total values instead of individual values from each subgroup. We develop rollout tests of lump-sum data under both fixed-sample-size and sequential settings, subject to the constraints on Type I and Type II error probabilities. Numerical studies are conducted to validate our theoretical results.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 9","pages":"1777-1790"},"PeriodicalIF":1.1,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12217103/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144560151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Delaying bud-break on pecan trees: a Bayesian longitudinal multinomial regression approach. 核桃树延迟发芽:贝叶斯纵向多项回归方法。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-12-12 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2024.2436007
Dayna P Saldaña Zepeda, Richard Heerema, Ciro Velasco Cruz, William Giese, Joshua Sherman

A multivariate Bayesian Probit model is adapted to analyze a longitudinal multiclass-ordinal response, with a linear plateau as the longitudinal model. Measurements on pecan bud growth were collected on irregular time intervals, about a week apart from late March to mid April, using a six-level ordinal scale. The data are from two randomized complete block designs with four blocks each. The experiments were setup and initiated in 2018 in a pecan orchard, at two different locations, to evaluate the effect of two sets of four treatments on delaying growth of recently broken pecan buds to minimize bud loss due to low temperatures. A simulation study was successfully carried out to validate the model implementation. Treatment 3 of Experiment 1 was associated with the greatest reduction in bud growth rate. In Experiment 2, Treatments 2 and 3 had some effect on delaying bud growth. Although treatment effects were not statistically different in either experiment, this paper presents a practical and efficient modeling technique for longitudinal multinomial ordinal data, a common data type in applied agricultural research studies.

采用多变量贝叶斯Probit模型分析纵向多类序数响应,纵向模型为线性平台。在3月下旬至4月中旬,每隔一周左右的不规则时间间隔采集山核桃芽生长的测量数据,采用6级有序量表。数据来自两个随机的完整块设计,每个块有四个块。该实验于2018年在两个不同地点的山核桃果园建立并启动,以评估两组四种处理对延迟最近破裂的山核桃芽生长的影响,以尽量减少低温造成的芽损失。仿真研究成功地验证了模型的实现。试验1处理3的芽生长速率降低幅度最大。在试验2中,处理2和处理3有一定的延缓芽生长的效果。尽管两个试验的处理效果没有统计学差异,但本文提出了一种实用而有效的纵向多项有序数据建模技术,这是应用农业研究中常见的数据类型。
{"title":"Delaying bud-break on pecan trees: a Bayesian longitudinal multinomial regression approach.","authors":"Dayna P Saldaña Zepeda, Richard Heerema, Ciro Velasco Cruz, William Giese, Joshua Sherman","doi":"10.1080/02664763.2024.2436007","DOIUrl":"10.1080/02664763.2024.2436007","url":null,"abstract":"<p><p>A multivariate Bayesian Probit model is adapted to analyze a longitudinal multiclass-ordinal response, with a linear plateau as the longitudinal model. Measurements on pecan bud growth were collected on irregular time intervals, about a week apart from late March to mid April, using a six-level ordinal scale. The data are from two randomized complete block designs with four blocks each. The experiments were setup and initiated in 2018 in a pecan orchard, at two different locations, to evaluate the effect of two sets of four treatments on delaying growth of recently broken pecan buds to minimize bud loss due to low temperatures. A simulation study was successfully carried out to validate the model implementation. Treatment 3 of Experiment 1 was associated with the greatest reduction in bud growth rate. In Experiment 2, Treatments 2 and 3 had some effect on delaying bud growth. Although treatment effects were not statistically different in either experiment, this paper presents a practical and efficient modeling technique for longitudinal multinomial ordinal data, a common data type in applied agricultural research studies.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 8","pages":"1649-1669"},"PeriodicalIF":1.1,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12147487/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144266339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interval-valued scalar-on-function linear quantile regression based on the bivariate center and radius method. 基于二元中心和半径法的区间值函数上标度线性分位数回归。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-12-11 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2024.2440035
Kaiyuan Liu, Min Xu, Jiang Du, Tianfa Xie

Interval-valued functional data, a new type of data in symbolic data analysis, depicts the characteristics of a variety of big data and has drawn the attention of many researchers. Mean regression is one of the important methods for analyzing interval-valued functional data. However, this method is sensitive to outliers and may lead to unreliable results. As an important complement to mean regression, this paper proposes an interval-valued scalar-on-function linear quantile regression model. Specifically, we constructed two linear quantile regression models for the interval-valued response and interval-valued functional regressors based on the bivariate center and radius method. The proposed model is more robust and efficient than mean regression methods when the data contain outliers as well as the error does not follow the normal distribution. Numerical simulations and real data analysis of a climate dataset demonstrate the effectiveness and superiority of the proposed method over the existing methods.

区间值函数数据是符号数据分析中的一种新型数据类型,它描述了各种大数据的特征,引起了许多研究者的关注。均值回归是分析区间值函数数据的重要方法之一。然而,该方法对异常值敏感,可能导致结果不可靠。作为对均值回归的重要补充,本文提出了区间值函数上标度线性分位数回归模型。具体而言,我们基于二元中心和半径方法构建了区间值响应和区间值函数回归的两个线性分位数回归模型。当数据中含有异常值且误差不服从正态分布时,该模型比均值回归方法具有更强的鲁棒性和效率。对某气候数据集的数值模拟和实际数据分析表明了该方法的有效性和优越性。
{"title":"Interval-valued scalar-on-function linear quantile regression based on the bivariate center and radius method.","authors":"Kaiyuan Liu, Min Xu, Jiang Du, Tianfa Xie","doi":"10.1080/02664763.2024.2440035","DOIUrl":"10.1080/02664763.2024.2440035","url":null,"abstract":"<p><p>Interval-valued functional data, a new type of data in symbolic data analysis, depicts the characteristics of a variety of big data and has drawn the attention of many researchers. Mean regression is one of the important methods for analyzing interval-valued functional data. However, this method is sensitive to outliers and may lead to unreliable results. As an important complement to mean regression, this paper proposes an interval-valued scalar-on-function linear quantile regression model. Specifically, we constructed two linear quantile regression models for the interval-valued response and interval-valued functional regressors based on the bivariate center and radius method. The proposed model is more robust and efficient than mean regression methods when the data contain outliers as well as the error does not follow the normal distribution. Numerical simulations and real data analysis of a climate dataset demonstrate the effectiveness and superiority of the proposed method over the existing methods.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 9","pages":"1791-1824"},"PeriodicalIF":1.1,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12217117/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144560150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable Bayesian inference for bradley-Terry models with ties: an application to honour based abuse. 带联系的bradley-Terry模型的可扩展贝叶斯推理:基于荣誉滥用的应用。
IF 1.2 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-12-11 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2024.2436608
Rowland G Seymour, Fabian Hernandez

Honour-based abuse covers a wide range of family abuse including female genital mutilation and forced marriage. Safeguarding professionals need to identify where abuses are happening in their local community to the best support those at risk of these crimes and take preventative action. However, there is little local data about these kinds of crime. To tackle this problem, we ran comparative judgement surveys to map abuses at the local level, where participants where shown pairs of wards and asked which had a higher rate of honour based abuse. In previous comparative judgement studies, participants reported fatigue associated with comparisons between areas with similar levels of abuse. Allowing for tied comparisons reduces fatigue, but increase the computational complexity when fitting the model. We designed an efficient Markov Chain Monte Carlo algorithm to fit a model with ties, allowing for a wide range of prior distributions on the model parameters. Working with South Yorkshire Police and Oxford Against Cutting, we mapped the risk of honour-based abuse at the community level in two counties in the UK.

基于荣誉的虐待涉及范围广泛的家庭虐待,包括切割女性生殖器官和强迫婚姻。保护专业人员需要确定当地社区发生虐待行为的地方,以便最好地支持那些面临这些罪行风险的人,并采取预防行动。然而,关于这类犯罪的当地数据很少。为了解决这个问题,我们进行了比较判断调查,以绘制地方一级的滥用情况,参与者被展示成对的病房,并询问哪个病房的荣誉滥用率更高。在之前的比较判断研究中,参与者报告了在相似程度的虐待区域之间进行比较时产生的疲劳。允许捆绑比较可以减少疲劳,但在拟合模型时增加了计算复杂性。我们设计了一种有效的马尔可夫链蒙特卡罗算法来拟合具有联系的模型,允许模型参数的大范围先验分布。我们与南约克郡警察局和牛津反切割协会合作,绘制了英国两个郡社区层面基于荣誉的虐待风险图。
{"title":"Scalable Bayesian inference for bradley-Terry models with ties: an application to honour based abuse.","authors":"Rowland G Seymour, Fabian Hernandez","doi":"10.1080/02664763.2024.2436608","DOIUrl":"https://doi.org/10.1080/02664763.2024.2436608","url":null,"abstract":"<p><p>Honour-based abuse covers a wide range of family abuse including female genital mutilation and forced marriage. Safeguarding professionals need to identify where abuses are happening in their local community to the best support those at risk of these crimes and take preventative action. However, there is little local data about these kinds of crime. To tackle this problem, we ran comparative judgement surveys to map abuses at the local level, where participants where shown pairs of wards and asked which had a higher rate of honour based abuse. In previous comparative judgement studies, participants reported fatigue associated with comparisons between areas with similar levels of abuse. Allowing for tied comparisons reduces fatigue, but increase the computational complexity when fitting the model. We designed an efficient Markov Chain Monte Carlo algorithm to fit a model with ties, allowing for a wide range of prior distributions on the model parameters. Working with South Yorkshire Police and Oxford Against Cutting, we mapped the risk of honour-based abuse at the community level in two counties in the UK.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 9","pages":"1695-1712"},"PeriodicalIF":1.2,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12217112/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144560152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimation for time-varying coefficient smoothed quantile regression. 时变系数的平滑分位数回归估计。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-12-10 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2024.2440056
Lixia Hu, Jinhong You, Qian Huang, Shu Liu

Time-varying coefficient regression is commonly used in the modeling of nonstationary stochastic processes. In this paper, we consider a time-varying coefficient convolution-type smoothed quantile regression (conquer). The covariates and errors are assumed to belong to a general class of locally stationary processes. We propose a local linear conquer estimator for the varying-coefficient function, and obtain the global Bahadur-Kiefer representation, which yields the asymptotic normality. Furthermore, statistical inference on simultaneous confidence bands is also studied. We investigate the finite-sample performance of the conquer estimator and confirm the validity of our asymptotic theory by conducting extensive simulation studies. We also consider financial volatility data as an example of a real-world application.

时变系数回归是一种常用的非平稳随机过程建模方法。本文考虑一种时变系数卷积型平滑分位数回归(conquer)。假设协变量和误差属于一类一般的局部平稳过程。我们提出了变系数函数的局部线性征服估计,并得到了全局的Bahadur-Kiefer表示,该表示具有渐近正态性。此外,还研究了同步置信带的统计推断。我们研究了征服估计器的有限样本性能,并通过广泛的仿真研究证实了我们的渐近理论的有效性。我们还将金融波动数据作为实际应用的一个例子。
{"title":"Estimation for time-varying coefficient smoothed quantile regression.","authors":"Lixia Hu, Jinhong You, Qian Huang, Shu Liu","doi":"10.1080/02664763.2024.2440056","DOIUrl":"10.1080/02664763.2024.2440056","url":null,"abstract":"<p><p>Time-varying coefficient regression is commonly used in the modeling of nonstationary stochastic processes. In this paper, we consider a time-varying coefficient <b>con</b>volution-type smoothed <b>qu</b>antil<b>e</b> <b>r</b>egression (<i>conquer</i>). The covariates and errors are assumed to belong to a general class of locally stationary processes. We propose a local linear <i>conquer</i> estimator for the varying-coefficient function, and obtain the global Bahadur-Kiefer representation, which yields the asymptotic normality. Furthermore, statistical inference on simultaneous confidence bands is also studied. We investigate the finite-sample performance of the <i>conquer</i> estimator and confirm the validity of our asymptotic theory by conducting extensive simulation studies. We also consider financial volatility data as an example of a real-world application.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 9","pages":"1825-1846"},"PeriodicalIF":1.1,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12217113/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144560148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrative analysis of high-dimensional quantile regression with contrasted penalization. 高维分位数回归与对比惩罚的综合分析。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-12-10 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2024.2438799
Panpan Ren, Xu Liu, Xiao Zhang, Peng Zhan, Tingting Qiu

In the era of big data, the simultaneous analysis of multiple high-dimensional, heavy-tailed datasets has become essential. Integrative analysis offers a powerful approach to combine and synthesize information from these various datasets, and often outperforming traditional meta-analysis and single-dataset analysis. In this paper, we introduce a novel high-dimensional integrative quantile regression that can accommodate the complexities inherent in multi-dataset analysis. A contrast penalty that smooths regression coefficients is introduced to account for across-dataset structures and improve variable selection. To ease the computational burden associated with high-dimensional quantile regression, a new algorithm is developed that is effective at computing solution paths and selecting significant variables. Monte Carlo simulations demonstrate its competitive performance. Additionally, the proposed method is applied to data from the China Health and Retirement Longitudinal Study, illustrating its practical utility in identifying influential factors affecting support income for the elderly. Findings indicate that adult children's individual characteristics and emotional comfort are primary factors of support income, and the extent of their impact varies across regions.

在大数据时代,同时分析多个高维、重尾数据集已变得必不可少。综合分析提供了一种强大的方法来组合和综合来自这些不同数据集的信息,并且通常优于传统的元分析和单数据集分析。在本文中,我们引入了一种新的高维积分分位数回归,它可以适应多数据集分析中固有的复杂性。引入平滑回归系数的对比惩罚来考虑跨数据集结构并改进变量选择。为了减轻与高维分位数回归相关的计算负担,提出了一种新的算法,可以有效地计算解路径和选择显著变量。蒙特卡洛仿真验证了其具有竞争力的性能。此外,本文还将该方法应用于中国健康与退休纵向研究的数据,说明了该方法在识别影响老年人赡养收入影响因素方面的实际效用。研究发现,成年子女的个体特征和情绪舒适度是影响赡养收入的主要因素,其影响程度因地区而异。
{"title":"Integrative analysis of high-dimensional quantile regression with contrasted penalization.","authors":"Panpan Ren, Xu Liu, Xiao Zhang, Peng Zhan, Tingting Qiu","doi":"10.1080/02664763.2024.2438799","DOIUrl":"10.1080/02664763.2024.2438799","url":null,"abstract":"<p><p>In the era of big data, the simultaneous analysis of multiple high-dimensional, heavy-tailed datasets has become essential. Integrative analysis offers a powerful approach to combine and synthesize information from these various datasets, and often outperforming traditional meta-analysis and single-dataset analysis. In this paper, we introduce a novel high-dimensional integrative quantile regression that can accommodate the complexities inherent in multi-dataset analysis. A contrast penalty that smooths regression coefficients is introduced to account for across-dataset structures and improve variable selection. To ease the computational burden associated with high-dimensional quantile regression, a new algorithm is developed that is effective at computing solution paths and selecting significant variables. Monte Carlo simulations demonstrate its competitive performance. Additionally, the proposed method is applied to data from the China Health and Retirement Longitudinal Study, illustrating its practical utility in identifying influential factors affecting support income for the elderly. Findings indicate that adult children's individual characteristics and emotional comfort are primary factors of support income, and the extent of their impact varies across regions.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 9","pages":"1760-1776"},"PeriodicalIF":1.1,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12217111/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144560149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
To impute or not? Testing multivariate normality on incomplete dataset: revisiting the BHEP test. 要不要归罪于人?在不完整数据集上检验多元正态性:重访BHEP检验。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-12-09 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2024.2438798
Danijel G Aleksić, Bojana Milošević

In this paper, we focus on testing multivariate normality using the BHEP test with data that are missing completely at random. Our objective is twofold: first, to gain insight into the asymptotic behavior of the BHEP test statistics under two widely used approaches for handling missing data, namely complete-case analysis and imputation, and second, to compare the power performance of the test statistic under these approaches. Since complete-case approach removes all elements of the sample with at least one missing component, it might lead to the loss of information. On the other hand, we note that performing the test on imputed data as if they were complete, Type I error becomes severely distorted. To address these issues, we propose an appropriate bootstrap algorithm for approximating p-values. Extensive simulation studies demonstrate that both mean and median approaches exhibit greater power compared to testing with complete-case analysis, and open some questions for further research. The proposed methodology is illustrated with real-data examples.

在本文中,我们着重于使用完全随机缺失数据的BHEP检验来测试多元正态性。我们的目标有两个:首先,深入了解BHEP检验统计量在处理缺失数据的两种广泛使用的方法下的渐近行为,即完全案例分析和imputation,其次,比较这些方法下检验统计量的功率性能。由于完全案例方法删除了样本中至少缺少一个组件的所有元素,因此可能会导致信息丢失。另一方面,我们注意到,在输入数据上执行测试,就好像它们是完整的一样,I型错误变得严重扭曲。为了解决这些问题,我们提出了一个适当的自举算法来逼近p值。广泛的模拟研究表明,与全案例分析相比,均值和中位数方法都表现出更大的功效,并为进一步研究打开了一些问题。用实际数据实例说明了所提出的方法。
{"title":"To impute or not? Testing multivariate normality on incomplete dataset: revisiting the BHEP test.","authors":"Danijel G Aleksić, Bojana Milošević","doi":"10.1080/02664763.2024.2438798","DOIUrl":"10.1080/02664763.2024.2438798","url":null,"abstract":"<p><p>In this paper, we focus on testing multivariate normality using the BHEP test with data that are missing completely at random. Our objective is twofold: first, to gain insight into the asymptotic behavior of the BHEP test statistics under two widely used approaches for handling missing data, namely complete-case analysis and imputation, and second, to compare the power performance of the test statistic under these approaches. Since complete-case approach removes all elements of the sample with at least one missing component, it might lead to the loss of information. On the other hand, we note that performing the test on imputed data as if they were complete, Type I error becomes severely distorted. To address these issues, we propose an appropriate bootstrap algorithm for approximating <i>p</i>-values. Extensive simulation studies demonstrate that both mean and median approaches exhibit greater power compared to testing with complete-case analysis, and open some questions for further research. The proposed methodology is illustrated with real-data examples.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 9","pages":"1742-1759"},"PeriodicalIF":1.1,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12217108/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144560153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A robust and efficient change point detection method for high-dimensional linear models. 一种鲁棒高效的高维线性模型变化点检测方法。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-12-03 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2024.2436008
Zhong-Cheng Han, Kong-Sheng Zhang, Yan-Yong Zhao

In the context of linear models, a key problem of interest is to estimate the regression coefficient. Nevertheless, in certain instances, the vector of unknown coefficient parameters in a linear regression model differs from one segment to another. In this paper, when the dimension of covariates is high, a new method is proposed to examine a linear model in which the regression coefficient of two subpopulations may be different. To achieve robustness and efficiency, we introduce modal linear regression as a means of estimating the unknown coefficient parameters. Furthermore, our proposed method is capable of selecting variables and checking change points. Under certain mild assumptions, the limiting behavior of our proposed method can be established. Additionally, an estimation algorithm based on kick-one-off and SCAD approach is developed to implement in practice. For illustration, simulation studies and a real data are considered to assess the performance of our proposed method.

在线性模型的背景下,一个关键的问题是估计回归系数。然而,在某些情况下,线性回归模型中未知系数参数的向量在不同的段之间是不同的。本文针对协变量维数较大的情况,提出了一种检验两个亚群回归系数可能不同的线性模型的新方法。为了达到鲁棒性和效率,我们引入了模态线性回归作为估计未知系数参数的手段。此外,我们提出的方法能够选择变量和检查变化点。在某些温和的假设条件下,可以建立本文方法的极限行为。在此基础上,提出了一种基于单次冲击和SCAD方法的估计算法。为了说明,仿真研究和实际数据被考虑来评估我们提出的方法的性能。
{"title":"A robust and efficient change point detection method for high-dimensional linear models.","authors":"Zhong-Cheng Han, Kong-Sheng Zhang, Yan-Yong Zhao","doi":"10.1080/02664763.2024.2436008","DOIUrl":"10.1080/02664763.2024.2436008","url":null,"abstract":"<p><p>In the context of linear models, a key problem of interest is to estimate the regression coefficient. Nevertheless, in certain instances, the vector of unknown coefficient parameters in a linear regression model differs from one segment to another. In this paper, when the dimension of covariates is high, a new method is proposed to examine a linear model in which the regression coefficient of two subpopulations may be different. To achieve robustness and efficiency, we introduce modal linear regression as a means of estimating the unknown coefficient parameters. Furthermore, our proposed method is capable of selecting variables and checking change points. Under certain mild assumptions, the limiting behavior of our proposed method can be established. Additionally, an estimation algorithm based on kick-one-off and SCAD approach is developed to implement in practice. For illustration, simulation studies and a real data are considered to assess the performance of our proposed method.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 9","pages":"1671-1694"},"PeriodicalIF":1.1,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12217119/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144560247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A clustering approach to integrative analyses of multiomic cancer data. 多组癌症数据综合分析的聚类方法。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-11-29 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2024.2431742
Dongyan Yan, Subharup Guha

Rapid technological advances have allowed for molecular profiling across multiple omics domains for clinical decision-making in many diseases, especially cancer. However, as tumor development and progression are biological processes involving composite genomic aberrations, key challenges are to effectively assimilate information from these domains to identify genomic signatures and druggable biological entities, develop accurate risk prediction profiles for future patients, and identify novel patient subgroups for tailored therapy and monitoring. We propose integrative frameworks for high-dimensional multiple-domain cancer data. These Bayesian mixture model-based approaches coherently incorporate dependence within and between domains to accurately detect tumor subtypes, thus providing a catalog of genomic aberrations associated with cancer taxonomy. The flexible and scalable Bayesian nonparametric strategy performs simultaneous bidirectional clustering of the tumor samples and genomic probes to achieve dimension reduction. We describe an efficient variable selection procedure that can identify relevant genomic aberrations and potentially reveal underlying drivers of disease. Although the work is motivated by lung cancer datasets, the proposed methods are broadly applicable in a variety of contexts involving high-dimensional data. The success of the methodology is demonstrated using artificial data and lung cancer omics profiles publicly available from The Cancer Genome Atlas.

快速的技术进步使得跨多个组学域的分子谱分析能够用于许多疾病,特别是癌症的临床决策。然而,由于肿瘤的发展和进展是涉及复合基因组畸变的生物学过程,关键的挑战是有效地吸收来自这些域的信息以识别基因组特征和可药物生物实体,为未来患者制定准确的风险预测概况,并确定新的患者亚群以进行定制治疗和监测。我们提出了高维多域癌症数据的集成框架。这些基于贝叶斯混合模型的方法连贯地结合域内和域间的依赖性来准确检测肿瘤亚型,从而提供与癌症分类相关的基因组畸变目录。灵活且可扩展的贝叶斯非参数策略同时对肿瘤样本和基因组探针进行双向聚类,以实现降维。我们描述了一种有效的变量选择程序,可以识别相关的基因组畸变并潜在地揭示疾病的潜在驱动因素。虽然这项工作是由肺癌数据集驱动的,但所提出的方法广泛适用于涉及高维数据的各种环境。使用人工数据和肺癌基因组图谱公开提供的肺癌组学资料证明了该方法的成功。
{"title":"A clustering approach to integrative analyses of multiomic cancer data.","authors":"Dongyan Yan, Subharup Guha","doi":"10.1080/02664763.2024.2431742","DOIUrl":"10.1080/02664763.2024.2431742","url":null,"abstract":"<p><p>Rapid technological advances have allowed for molecular profiling across multiple omics domains for clinical decision-making in many diseases, especially cancer. However, as tumor development and progression are biological processes involving composite genomic aberrations, key challenges are to effectively assimilate information from these domains to identify genomic signatures and druggable biological entities, develop accurate risk prediction profiles for future patients, and identify novel patient subgroups for tailored therapy and monitoring. We propose integrative frameworks for high-dimensional multiple-domain cancer data. These Bayesian mixture model-based approaches coherently incorporate dependence within and between domains to accurately detect tumor subtypes, thus providing a catalog of genomic aberrations associated with cancer taxonomy. The flexible and scalable Bayesian nonparametric strategy performs simultaneous bidirectional clustering of the tumor samples and genomic probes to achieve dimension reduction. We describe an efficient variable selection procedure that can identify relevant genomic aberrations and potentially reveal underlying drivers of disease. Although the work is motivated by lung cancer datasets, the proposed methods are broadly applicable in a variety of contexts involving high-dimensional data. The success of the methodology is demonstrated using artificial data and lung cancer omics profiles publicly available from The Cancer Genome Atlas.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 8","pages":"1539-1560"},"PeriodicalIF":1.1,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12147493/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144266335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A robust Bayesian latent position approach for community detection in networks with continuous attributes. 连续属性网络中社团检测的鲁棒贝叶斯潜在位置方法。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-11-29 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2024.2431736
Zhumengmeng Jin, Juan Sosa, Shangchen Song, Brenda Betancourt

The increasing prevalence of multiplex networks has spurred a critical need to take into account potential dependencies across different layers, especially when the goal is community detection, which is a fundamental learning task in network analysis. We propose a full Bayesian mixture model for community detection in both single-layer and multi-layer networks. A key feature of our model is the joint modeling of the nodal attributes that often come with the network data as a spatial process over the latent space. In addition, our model for multi-layer networks allows layers to have different strengths of dependency in the unique latent position structure and assumes that the probability of a relation between two actors (in a layer) depends on the distances between their latent positions (multiplied by a layer-specific factor) and the difference between their nodal attributes. Under our prior specifications, the actors' positions in the latent space arise from a finite mixture of Gaussian distributions, each corresponding to a cluster. Simulated examples show that our model outperforms existing benchmark models and exhibits significantly greater robustness when handling datasets with missing values. The model is also applied to a real-world three-layer network of employees in a law firm.

随着多路网络的日益普及,人们迫切需要考虑不同层之间的潜在依赖关系,特别是当目标是社区检测时,这是网络分析中的一项基本学习任务。我们提出了一个完整的贝叶斯混合模型,用于单层和多层网络中的社区检测。我们模型的一个关键特征是节点属性的联合建模,这些节点属性通常作为潜在空间上的空间过程与网络数据一起出现。此外,我们的多层网络模型允许各层在独特的潜在位置结构中具有不同的依赖强度,并假设两个参与者(在一层中)之间关系的概率取决于其潜在位置之间的距离(乘以特定于层的因素)及其节点属性之间的差异。在我们之前的规范下,参与者在潜在空间中的位置来自高斯分布的有限混合,每个分布对应于一个簇。模拟示例表明,我们的模型优于现有的基准模型,并且在处理缺失值的数据集时表现出更强的鲁棒性。该模型也适用于现实世界中一家律师事务所的三层员工网络。
{"title":"A robust Bayesian latent position approach for community detection in networks with continuous attributes.","authors":"Zhumengmeng Jin, Juan Sosa, Shangchen Song, Brenda Betancourt","doi":"10.1080/02664763.2024.2431736","DOIUrl":"10.1080/02664763.2024.2431736","url":null,"abstract":"<p><p>The increasing prevalence of multiplex networks has spurred a critical need to take into account potential dependencies across different layers, especially when the goal is community detection, which is a fundamental learning task in network analysis. We propose a full Bayesian mixture model for community detection in both single-layer and multi-layer networks. A key feature of our model is the joint modeling of the nodal attributes that often come with the network data as a spatial process over the latent space. In addition, our model for multi-layer networks allows layers to have different strengths of dependency in the unique latent position structure and assumes that the probability of a relation between two actors (in a layer) depends on the distances between their latent positions (multiplied by a layer-specific factor) and the difference between their nodal attributes. Under our prior specifications, the actors' positions in the latent space arise from a finite mixture of Gaussian distributions, each corresponding to a cluster. Simulated examples show that our model outperforms existing benchmark models and exhibits significantly greater robustness when handling datasets with missing values. The model is also applied to a real-world three-layer network of employees in a law firm.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 8","pages":"1513-1538"},"PeriodicalIF":1.1,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12147515/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144266337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Applied Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1