首页 > 最新文献

Journal of the Korean Statistical Society最新文献

英文 中文
Double data piling: a high-dimensional solution for asymptotically perfect multi-category classification 双重数据堆积:渐近完美多类别分类的高维解决方案
IF 0.6 4区 数学 Q4 STATISTICS & PROBABILITY Pub Date : 2024-04-03 DOI: 10.1007/s42952-024-00263-6
Taehyun Kim, Woonyoung Chang, Jeongyoun Ahn, Sungkyu Jung

For high-dimensional classification, interpolation of training data manifests as the data piling phenomenon, in which linear projections of data vectors from each class collapse to a single value. Recent research has revealed an additional phenomenon known as the ‘second data piling’ for independent test data in binary classification, providing a theoretical understanding of asymptotically perfect classification. This paper extends these findings to multi-category classification and provides a comprehensive characterization of the double data piling phenomenon. We define the maximal data piling subspace, which maximizes the sum of pairwise distances between piles of training data in multi-category classification. Furthermore, we show that a second data piling subspace that induces data piling for independent data exists and can be consistently estimated by projecting the negatively-ridged discriminant subspace onto an estimated ‘signal’ subspace. By leveraging this second data piling phenomenon, we propose a bias-correction strategy for class assignments, which asymptotically achieves perfect classification. The present research sheds light on benign overfitting and enhances the understanding of perfect multi-category classification of high-dimensional discrimination with a help of high-dimensional asymptotics.

对于高维分类来说,训练数据的插值表现为数据堆积现象,即每个类别的数据向量的线性投影坍缩为单一值。最近的研究揭示了二元分类中独立测试数据的 "第二数据堆积 "现象,为渐近完美分类提供了理论依据。本文将这些发现扩展到多类别分类,并对双重数据堆积现象进行了全面描述。我们定义了最大数据堆积子空间,它能最大化多类别分类中成堆训练数据之间的成对距离之和。此外,我们还证明了第二个数据堆积子空间的存在,它能诱发独立数据的数据堆积,并能通过将负阶差判别子空间投影到估计的 "信号 "子空间上而得到一致的估计。通过利用第二数据堆积现象,我们提出了一种用于类别分配的纠偏策略,该策略可近似实现完美分类。本研究揭示了良性过拟合现象,并借助高维渐近学加深了对高维判别的完美多类别分类的理解。
{"title":"Double data piling: a high-dimensional solution for asymptotically perfect multi-category classification","authors":"Taehyun Kim, Woonyoung Chang, Jeongyoun Ahn, Sungkyu Jung","doi":"10.1007/s42952-024-00263-6","DOIUrl":"https://doi.org/10.1007/s42952-024-00263-6","url":null,"abstract":"<p>For high-dimensional classification, interpolation of training data manifests as the data piling phenomenon, in which linear projections of data vectors from each class collapse to a single value. Recent research has revealed an additional phenomenon known as the ‘second data piling’ for independent test data in binary classification, providing a theoretical understanding of asymptotically perfect classification. This paper extends these findings to multi-category classification and provides a comprehensive characterization of the double data piling phenomenon. We define the maximal data piling subspace, which maximizes the sum of pairwise distances between piles of training data in multi-category classification. Furthermore, we show that a second data piling subspace that induces data piling for independent data exists and can be consistently estimated by projecting the negatively-ridged discriminant subspace onto an estimated ‘signal’ subspace. By leveraging this second data piling phenomenon, we propose a bias-correction strategy for class assignments, which asymptotically achieves perfect classification. The present research sheds light on benign overfitting and enhances the understanding of perfect multi-category classification of high-dimensional discrimination with a help of high-dimensional asymptotics.</p>","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"83 1","pages":""},"PeriodicalIF":0.6,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140577260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonparametric tests for combined location-scale and Lehmann alternatives using adaptive approach and max-type metric 利用自适应方法和最大类型度量对位置尺度和莱曼综合替代方案进行非参数检验
IF 0.6 4区 数学 Q4 STATISTICS & PROBABILITY Pub Date : 2024-04-02 DOI: 10.1007/s42952-024-00262-7

Abstract

The paper deals with the classical two-sample problem for the combined location-scale and Lehmann alternatives, known as the versatile alternative. Recently, a combination of the square of the standardized Wilcoxon, the standardized Ansari–Bradley and the standardized Anti-Savage statistics based on the Euclidean distance has been proposed. The Anti-Savage test is the locally most powerful rank test for the right-skewed Gumbel distribution. Furthermore, the Savage test is the locally most powerful linear rank test for the left-skewed Gumbel distribution. Then, a test statistic combining the Wilcoxon, the Ansari–Bradley, and Savage statistics is proposed. The limiting distribution of the proposed statistic is derived under the null and the alternative hypotheses. In addition, the asymptotic power of the suggested statistic is investigated. Moreover, an adaptive test is proposed based on a selection rule. We compare the power performance against various fixed alternatives using Monte Carlo. The proposed test statistic displays outstanding performance in certain situations. An illustration of the proposed test statistic is presented to explain a biomedical experiment. Finally, we offer some concluding remarks.

摘要 本文论述了位置标度和莱曼替代方案(称为多用途替代方案)组合的经典双样本问题。最近,有人提出了基于欧氏距离的标准化 Wilcoxon、标准化 Ansari-Bradley 和标准化 Anti-Savage 统计量的平方组合。反萨维奇检验是右偏 Gumbel 分布的局部最强秩检验。此外,对于左偏 Gumbel 分布,Savage 检验是局部最强的线性秩检验。然后,提出了一种结合 Wilcoxon、Ansari-Bradley 和 Savage 统计量的检验统计量。在零假设和备择假设下,得出了所提统计量的极限分布。此外,还研究了建议统计量的渐近功率。此外,还提出了一种基于选择规则的自适应检验。我们使用蒙特卡罗方法比较了各种固定替代方案的功率性能。所提出的检验统计量在某些情况下表现突出。我们以一个生物医学实验为例,对所提出的检验统计量进行了说明。最后,我们提出一些结束语。
{"title":"Nonparametric tests for combined location-scale and Lehmann alternatives using adaptive approach and max-type metric","authors":"","doi":"10.1007/s42952-024-00262-7","DOIUrl":"https://doi.org/10.1007/s42952-024-00262-7","url":null,"abstract":"<h3>Abstract</h3> <p>The paper deals with the classical two-sample problem for the combined location-scale and Lehmann alternatives, known as the versatile alternative. Recently, a combination of the square of the standardized Wilcoxon, the standardized Ansari–Bradley and the standardized Anti-Savage statistics based on the Euclidean distance has been proposed. The Anti-Savage test is the locally most powerful rank test for the right-skewed Gumbel distribution. Furthermore, the Savage test is the locally most powerful linear rank test for the left-skewed Gumbel distribution. Then, a test statistic combining the Wilcoxon, the Ansari–Bradley, and Savage statistics is proposed. The limiting distribution of the proposed statistic is derived under the null and the alternative hypotheses. In addition, the asymptotic power of the suggested statistic is investigated. Moreover, an adaptive test is proposed based on a selection rule. We compare the power performance against various fixed alternatives using Monte Carlo. The proposed test statistic displays outstanding performance in certain situations. An illustration of the proposed test statistic is presented to explain a biomedical experiment. Finally, we offer some concluding remarks.</p>","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"11 1","pages":""},"PeriodicalIF":0.6,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140577293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variants of non-symmetric correspondence analysis for nominal and ordinal variables 名义变量和序数变量非对称对应分析的变体
IF 0.6 4区 数学 Q4 STATISTICS & PROBABILITY Pub Date : 2024-03-23 DOI: 10.1007/s42952-023-00253-0
Riya R. Jain, Kirtee K. Kamalja

Non-symmetric correspondence analysis (NSCA) is a multivariate data analysis technique that has gained increasing attention in recent years. NSCA is an extension of traditional correspondence analysis that allows for the analysis of asymmetric association between two or more categorical variables. NSCA involves graphically depicting the one-way relationship between variables cross classified in a contingency table through a biplot. This paper provides a comprehensive overview of the popular approaches of NSCA developed over the years. Some fundamental variations in the family of NSCA such as Simple NSCA, Doubly Ordered NSCA, Singly Ordered NSCA, Three-way Nominal NSCA, Triply Ordered NSCA etc. are discussed thoroughly. A systematic step-by-step algorithms for each variant of NSCA and their demonstrations are neatly presented. Further a summary of NSCA variants in literature, the concise tabular presentation of R-packages developed for variants of CA/NSCA and a collection of variety of datasets where NSCA is performed are the key features of the paper. Moreover, we compare and contrast the method of NSCA with multinomial logistic regression (MNLR) to discuss some disparities between both the approaches. The paper aims to provide the theoretical, practical and computational issues of NSCA in structured manner and to highlight the further challenges with reference to NSCA.

非对称对应分析(NSCA)是一种多元数据分析技术,近年来受到越来越多的关注。非对称对应分析是传统对应分析的延伸,可以分析两个或多个分类变量之间的非对称关联。NSCA 包括通过双向图以图形方式描述或然表中交叉分类变量之间的单向关系。本文全面概述了多年来流行的 NSCA 方法。本文深入讨论了 NSCA 系列中的一些基本变体,如简单 NSCA、双排序 NSCA、单排序 NSCA、三向名义 NSCA、三重排序 NSCA 等。此外,还详细介绍了 NSCA 各变体的系统分步算法及其演示。此外,文献中的 NSCA 变体摘要、为 CA/NSCA 变体开发的 R 包的简明表述以及收集的各种 NSCA 数据集是本文的主要特色。此外,我们还将 NSCA 方法与多项式逻辑回归(MNLR)进行了对比,讨论了两种方法之间的一些差异。本文旨在以结构化的方式提供 NSCA 的理论、实践和计算问题,并强调 NSCA 所面临的进一步挑战。
{"title":"Variants of non-symmetric correspondence analysis for nominal and ordinal variables","authors":"Riya R. Jain, Kirtee K. Kamalja","doi":"10.1007/s42952-023-00253-0","DOIUrl":"https://doi.org/10.1007/s42952-023-00253-0","url":null,"abstract":"<p>Non-symmetric correspondence analysis (NSCA) is a multivariate data analysis technique that has gained increasing attention in recent years. NSCA is an extension of traditional correspondence analysis that allows for the analysis of asymmetric association between two or more categorical variables. NSCA involves graphically depicting the one-way relationship between variables cross classified in a contingency table through a biplot. This paper provides a comprehensive overview of the popular approaches of NSCA developed over the years. Some fundamental variations in the family of NSCA such as Simple NSCA, Doubly Ordered NSCA, Singly Ordered NSCA, Three-way Nominal NSCA, Triply Ordered NSCA etc. are discussed thoroughly. A systematic step-by-step algorithms for each variant of NSCA and their demonstrations are neatly presented. Further a summary of NSCA variants in literature, the concise tabular presentation of R-packages developed for variants of CA/NSCA and a collection of variety of datasets where NSCA is performed are the key features of the paper. Moreover, we compare and contrast the method of NSCA with multinomial logistic regression (MNLR) to discuss some disparities between both the approaches. The paper aims to provide the theoretical, practical and computational issues of NSCA in structured manner and to highlight the further challenges with reference to NSCA.</p>","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"160 1","pages":""},"PeriodicalIF":0.6,"publicationDate":"2024-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140201070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Finding an NARE whose minimal nonnegative solution represents first passage quantities in the two-dimensional Brownian motion 寻找一个 NARE,其最小非负解代表二维布朗运动中的第一通道量
IF 0.6 4区 数学 Q4 STATISTICS & PROBABILITY Pub Date : 2024-03-22 DOI: 10.1007/s42952-024-00261-8
Sung-Chul Hong, Soohan Ahn

The goal of this paper is to find a nonsymmetric algebraic Riccati equation(NARE) of which the minimal nonnegative solution can represent the Laplace transform of the total increment of one component during the first passage time of the other in the two-dimensional Brownian motion. For that purpose, we construct a sequence of two-dimensional Markov modulated fluid flow which converges to the two-dimensional Brownian motion and then derive various approximation results relevant to the NARE of our interest. This is the preliminary research for investigating first-passage-related quantities in the two-dimensional Markov modulated Brownian motion in which the parameters vary according to the states of an underlying Markov process.

本文的目标是找到一个非对称代数里卡提方程(NARE),其最小非负解可以表示二维布朗运动中一个分量在另一个分量第一次通过时间内的总增量的拉普拉斯变换。为此,我们构建了一个收敛于二维布朗运动的二维马尔可夫调制流序列,然后推导出与我们感兴趣的 NARE 相关的各种近似结果。在二维马尔可夫调制布朗运动中,参数随底层马尔可夫过程的状态而变化,这是研究二维马尔可夫调制布朗运动中与第一通道相关的量的初步研究。
{"title":"Finding an NARE whose minimal nonnegative solution represents first passage quantities in the two-dimensional Brownian motion","authors":"Sung-Chul Hong, Soohan Ahn","doi":"10.1007/s42952-024-00261-8","DOIUrl":"https://doi.org/10.1007/s42952-024-00261-8","url":null,"abstract":"<p>The goal of this paper is to find a nonsymmetric algebraic Riccati equation(NARE) of which the minimal nonnegative solution can represent the Laplace transform of the total increment of one component during the first passage time of the other in the two-dimensional Brownian motion. For that purpose, we construct a sequence of two-dimensional Markov modulated fluid flow which converges to the two-dimensional Brownian motion and then derive various approximation results relevant to the NARE of our interest. This is the preliminary research for investigating first-passage-related quantities in the two-dimensional Markov modulated Brownian motion in which the parameters vary according to the states of an underlying Markov process.</p>","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"82 1","pages":""},"PeriodicalIF":0.6,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140201065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Two-sample test of stochastic block models via the maximum sampling entry-wise deviation 通过最大抽样条目偏差对随机块模型进行双样本检验
IF 0.6 4区 数学 Q4 STATISTICS & PROBABILITY Pub Date : 2024-03-03 DOI: 10.1007/s42952-024-00260-9
Qianyong Wu, Jiang Hu

The paper discusses a statistical problem related to testing for differences between two networks with community structures. While existing methods have been proposed, they encounter challenges and do not perform effectively when the networks become sparse. We propose a test statistic that combines a method proposed by Wu and Hu (2024) and a resampling process. Specifically, the proposed test statistic proves effective under the condition that the community-wise edge probability matrices have entries of order (Omega (log n/n)), where n denotes the network size. We derive the asymptotic null distribution of the test statistic and provide a guarantee of asymptotic power against the alternative hypothesis. To evaluate the performance of the proposed test statistic, we conduct simulations and provide real data examples. The results indicate that the proposed test statistic performs well for both dense and sparse networks.

本文讨论了一个与测试两个具有群落结构的网络之间差异有关的统计问题。虽然现有的方法已经提出,但它们遇到了挑战,当网络变得稀疏时,这些方法不能有效地发挥作用。我们提出了一种结合了 Wu 和 Hu(2024 年)提出的方法和重采样过程的检验统计量。具体来说,我们提出的测试统计量在以下条件下证明有效:社区边缘概率矩阵的阶数为(Omega (log n/n)),其中 n 表示网络规模。我们推导出了检验统计量的渐近零分布,并提供了针对备择假设的渐近功率保证。为了评估所提出的检验统计量的性能,我们进行了模拟并提供了实际数据示例。结果表明,所提出的检验统计量对密集和稀疏网络都有良好的表现。
{"title":"Two-sample test of stochastic block models via the maximum sampling entry-wise deviation","authors":"Qianyong Wu, Jiang Hu","doi":"10.1007/s42952-024-00260-9","DOIUrl":"https://doi.org/10.1007/s42952-024-00260-9","url":null,"abstract":"<p>The paper discusses a statistical problem related to testing for differences between two networks with community structures. While existing methods have been proposed, they encounter challenges and do not perform effectively when the networks become sparse. We propose a test statistic that combines a method proposed by Wu and Hu (2024) and a resampling process. Specifically, the proposed test statistic proves effective under the condition that the community-wise edge probability matrices have entries of order <span>(Omega (log n/n))</span>, where <i>n</i> denotes the network size. We derive the asymptotic null distribution of the test statistic and provide a guarantee of asymptotic power against the alternative hypothesis. To evaluate the performance of the proposed test statistic, we conduct simulations and provide real data examples. The results indicate that the proposed test statistic performs well for both dense and sparse networks.</p>","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"13 1","pages":""},"PeriodicalIF":0.6,"publicationDate":"2024-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140019370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Jackknife model averaging for linear regression models with missing responses 对有缺失响应的线性回归模型进行积刀模型平均化
IF 0.6 4区 数学 Q4 STATISTICS & PROBABILITY Pub Date : 2024-02-19 DOI: 10.1007/s42952-024-00259-2
Jie Zeng, Weihu Cheng, Guozhi Hu

We consider model averaging estimation problem in the linear regression model with missing response data, that allows for model misspecification. Based on the ‘complete’ data set for the response variable after inverse propensity score weighted imputation, we construct a leave-one-out cross-validation criterion for allocating model weights, where the propensity score model is estimated by the covariate balancing propensity score method. We derive some theoretical results to justify the proposed strategy. Firstly, when all candidate outcome regression models are misspecified, our procedures are proved to achieve optimality in terms of asymptotically minimizing the squared loss. Secondly, when the true outcome regression model is among the set of candidate models, the resulting model averaging estimators of the regression parameters are shown to be root-n consistent. Simulation studies provide evidence of the superiority of our methods over other existing model averaging methods, even when the propensity score model is misspecified. As an illustration, the approach is further applied to study the CD4 data.

我们考虑了有缺失响应数据的线性回归模型中的模型平均估算问题,该问题允许模型的错误规范。基于反倾向得分加权估算后响应变量的 "完整 "数据集,我们构建了一个用于分配模型权重的 "留一 "交叉验证准则,其中倾向得分模型是通过协变量平衡倾向得分法估算的。我们得出了一些理论结果来证明所提出的策略是正确的。首先,当所有候选结果回归模型都被错误指定时,我们的程序被证明在渐近最小化平方损失方面达到了最优。其次,当真正的结果回归模型在候选模型集中时,所得到的回归参数模型平均估计值证明是根n一致的。模拟研究证明了我们的方法优于其他现有的模型平均方法,即使倾向评分模型被错误地指定。作为示例,我们进一步将该方法应用于 CD4 数据的研究。
{"title":"Jackknife model averaging for linear regression models with missing responses","authors":"Jie Zeng, Weihu Cheng, Guozhi Hu","doi":"10.1007/s42952-024-00259-2","DOIUrl":"https://doi.org/10.1007/s42952-024-00259-2","url":null,"abstract":"<p>We consider model averaging estimation problem in the linear regression model with missing response data, that allows for model misspecification. Based on the ‘complete’ data set for the response variable after inverse propensity score weighted imputation, we construct a leave-one-out cross-validation criterion for allocating model weights, where the propensity score model is estimated by the covariate balancing propensity score method. We derive some theoretical results to justify the proposed strategy. Firstly, when all candidate outcome regression models are misspecified, our procedures are proved to achieve optimality in terms of asymptotically minimizing the squared loss. Secondly, when the true outcome regression model is among the set of candidate models, the resulting model averaging estimators of the regression parameters are shown to be root-<i>n</i> consistent. Simulation studies provide evidence of the superiority of our methods over other existing model averaging methods, even when the propensity score model is misspecified. As an illustration, the approach is further applied to study the CD4 data.</p>","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"35 1","pages":""},"PeriodicalIF":0.6,"publicationDate":"2024-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139904122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A self-normalization test for structural breaks in a regression model for panel data sets 面板数据集回归模型结构断裂的自归一化检验
IF 0.6 4区 数学 Q4 STATISTICS & PROBABILITY Pub Date : 2024-02-15 DOI: 10.1007/s42952-024-00255-6
Ji-Eun Choi, Dong Wan Shin

We construct a new structural break test in a panel regression model using the self-normalization method. The self-normalization test is shown to be superior to an existing test in that the former is theoretically and experimentally valid for regression models with serially and/or cross-sectionally correlated errors while the latter is not. We derive the asymptotic null distribution of the self-normalization test and its consistency under an alternative hypothesis. Unlike the existing test requiring bootstrap computation for critical values, the self-normalization test is implemented easily with a set of simple critical values. A Monte Carlo experiment reports that the self-normalization resolves the severe over-size problem of the existing test under serial and/or cross-sectional error correlation.

我们利用自归一化方法在面板回归模型中构建了一种新的结构断裂检验。结果表明,自归一化检验优于现有的检验方法,因为前者在理论和实验上对具有序列和/或横截面相关误差的回归模型有效,而后者则无效。我们推导出了自归一化检验的渐近零分布及其在替代假设下的一致性。与需要自举计算临界值的现有检验不同,自归一化检验只需一组简单的临界值即可轻松实现。蒙特卡罗实验报告显示,自归一化解决了现有检验在序列和/或横截面误差相关性下的严重超大问题。
{"title":"A self-normalization test for structural breaks in a regression model for panel data sets","authors":"Ji-Eun Choi, Dong Wan Shin","doi":"10.1007/s42952-024-00255-6","DOIUrl":"https://doi.org/10.1007/s42952-024-00255-6","url":null,"abstract":"<p>We construct a new structural break test in a panel regression model using the self-normalization method. The self-normalization test is shown to be superior to an existing test in that the former is theoretically and experimentally valid for regression models with serially and/or cross-sectionally correlated errors while the latter is not. We derive the asymptotic null distribution of the self-normalization test and its consistency under an alternative hypothesis. Unlike the existing test requiring bootstrap computation for critical values, the self-normalization test is implemented easily with a set of simple critical values. A Monte Carlo experiment reports that the self-normalization resolves the severe over-size problem of the existing test under serial and/or cross-sectional error correlation.</p>","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"12 1","pages":""},"PeriodicalIF":0.6,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139767824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gradient-based kernel variable selection for support vector hazards machine 基于梯度的支持向量危害机核变量选择
IF 0.6 4区 数学 Q4 STATISTICS & PROBABILITY Pub Date : 2024-02-15 DOI: 10.1007/s42952-024-00256-5
Sanghun Jeong, Kyungjun Kang, Hojin Yang

This study aims to improve the predictive performance for the event time through the machine learning model and find informative variables in the time-to-event data, simultaneously. To address this issue, after regarding the time-to-event data as the dichotomized counting processes data for predicting survival time, we consider the time-dependent support vector machine (SVM) framework for the dichotomized counting process data, where the decision function in this framework consists of the time-independent risk score and time-dependent intercept. Also, we consider the empirical partial derivative of the risk score function with respect to each marginal predictor as the indicator for the important predictor. Through this approach, it is possible to predict survival time and find variables that affect on the survival time at the same time. Simulation studies were conducted to confirm the performance of the model, and real data analysis was conducted by predicting the survival time of the lung cancer after the diagnosis and selecting genes associate with lung cancer through human gene data.

本研究旨在通过机器学习模型提高事件时间的预测性能,并同时找到时间到事件数据中的信息变量。为解决这一问题,我们将时间到事件数据视为用于预测生存时间的二分法计数过程数据,然后考虑对二分法计数过程数据采用与时间相关的支持向量机(SVM)框架,该框架中的决策函数由与时间无关的风险得分和与时间相关的截距组成。同时,我们将风险得分函数相对于每个边际预测因子的经验偏导数作为重要预测因子的指标。通过这种方法,可以预测生存时间,同时找到影响生存时间的变量。为了证实该模型的性能,我们进行了模拟研究,并通过人类基因数据预测肺癌确诊后的生存时间和选择与肺癌相关的基因,进行了实际数据分析。
{"title":"Gradient-based kernel variable selection for support vector hazards machine","authors":"Sanghun Jeong, Kyungjun Kang, Hojin Yang","doi":"10.1007/s42952-024-00256-5","DOIUrl":"https://doi.org/10.1007/s42952-024-00256-5","url":null,"abstract":"<p>This study aims to improve the predictive performance for the event time through the machine learning model and find informative variables in the time-to-event data, simultaneously. To address this issue, after regarding the time-to-event data as the dichotomized counting processes data for predicting survival time, we consider the time-dependent support vector machine (SVM) framework for the dichotomized counting process data, where the decision function in this framework consists of the time-independent risk score and time-dependent intercept. Also, we consider the empirical partial derivative of the risk score function with respect to each marginal predictor as the indicator for the important predictor. Through this approach, it is possible to predict survival time and find variables that affect on the survival time at the same time. Simulation studies were conducted to confirm the performance of the model, and real data analysis was conducted by predicting the survival time of the lung cancer after the diagnosis and selecting genes associate with lung cancer through human gene data.</p>","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"276 1","pages":""},"PeriodicalIF":0.6,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139767211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Testing for conditional independence of survival time from covariate 测试存活时间与协变量的条件独立性
IF 0.6 4区 数学 Q4 STATISTICS & PROBABILITY Pub Date : 2024-02-14 DOI: 10.1007/s42952-024-00257-4
Minjung Kwak

This study examined the test of independence of survival time from a covariate in a more general setting using empirical process techniques. Previous research has been extended in several ways: (1) allow incompleteness of observation owing to censoring (2) allow the time-dependent covariate (3) allow the non-uniform covariate (4) prove the validity of weighted bootstrap to implement the proposed testing procedure. Certain classes of test statistics that are functionals of a natural empirical process were studied, and the limiting distribution of these statistics was then derived using the functional delta method. The limiting distributions included some linear functionals of zero mean tight Brownian bridges under the null hypothesis, and the tests were consistent against general alternatives. Tests implemented using weighted bootstrap were shown to be valid. The proposals are illustrated via simulation studies and an application to acute leukemia data.

本研究利用经验过程技术,在更广泛的背景下检验了生存时间与协变量的独立性。以前的研究在以下几个方面得到了扩展:(1)允许因普查而导致的观察不完整(2)允许时间依赖协变量(3)允许非均匀协变量(4)证明加权自举的有效性,以实现所提出的检验程序。研究了作为自然经验过程函数的某些类别的检验统计量,然后使用函数三角法推导出这些统计量的极限分布。极限分布包括零均值紧布朗桥在零假设下的一些线性函数,而且测试与一般替代方法一致。使用加权自举法进行的检验证明是有效的。通过模拟研究和急性白血病数据的应用说明了这些建议。
{"title":"Testing for conditional independence of survival time from covariate","authors":"Minjung Kwak","doi":"10.1007/s42952-024-00257-4","DOIUrl":"https://doi.org/10.1007/s42952-024-00257-4","url":null,"abstract":"<p>This study examined the test of independence of survival time from a covariate in a more general setting using empirical process techniques. Previous research has been extended in several ways: (1) allow incompleteness of observation owing to censoring (2) allow the time-dependent covariate (3) allow the non-uniform covariate (4) prove the validity of weighted bootstrap to implement the proposed testing procedure. Certain classes of test statistics that are functionals of a natural empirical process were studied, and the limiting distribution of these statistics was then derived using the functional delta method. The limiting distributions included some linear functionals of zero mean tight Brownian bridges under the null hypothesis, and the tests were consistent against general alternatives. Tests implemented using weighted bootstrap were shown to be valid. The proposals are illustrated via simulation studies and an application to acute leukemia data.</p>","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"27 1","pages":""},"PeriodicalIF":0.6,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139767298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Asymptotic results of error density estimator in nonlinear autoregressive models 非线性自回归模型中误差密度估计器的渐近结果
IF 0.6 4区 数学 Q4 STATISTICS & PROBABILITY Pub Date : 2024-02-13 DOI: 10.1007/s42952-024-00258-3
Shipeng Wu, Wenzhi Yang, Min Gao, Hongyan Fang

This paper considers the asymptotic properties of the error density estimator in the nonlinear autoregressive models with (alpha)-mixing errors. The asymptotic distribution and uniform convergence rate for the kernel density estimator of error density function are obtained. Last, some simulations of histograms, confidence intervals and mean integrated square errors are illustrated, which agree with our theoretical results.

本文研究了具有混杂误差的非线性自回归模型中误差密度估计器的渐近特性。本文给出了误差密度函数核密度估计器的渐近分布和均匀收敛率。最后,对直方图、置信区间和平均积分平方误差进行了一些模拟说明,这些结果与我们的理论结果一致。
{"title":"Asymptotic results of error density estimator in nonlinear autoregressive models","authors":"Shipeng Wu, Wenzhi Yang, Min Gao, Hongyan Fang","doi":"10.1007/s42952-024-00258-3","DOIUrl":"https://doi.org/10.1007/s42952-024-00258-3","url":null,"abstract":"<p>This paper considers the asymptotic properties of the error density estimator in the nonlinear autoregressive models with <span>(alpha)</span>-mixing errors. The asymptotic distribution and uniform convergence rate for the kernel density estimator of error density function are obtained. Last, some simulations of histograms, confidence intervals and mean integrated square errors are illustrated, which agree with our theoretical results.</p>","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"29 1","pages":""},"PeriodicalIF":0.6,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139773596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of the Korean Statistical Society
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1