Journal of the Korean Statistical Society最新文献

英文中文

Classification of repeated measurements using bias corrected Euclidean distance discriminant function 利用偏差校正欧氏距离判别函数对重复测量进行分类

IF 0.6 4区数学 Q4 STATISTICS & PROBABILITY

Journal of the Korean Statistical Society

Pub Date : 2023-12-12 DOI: 10.1007/s42952-023-00246-z

Edward Kanuti Ngailo, Saralees Nadarajah

This paper introduces a novel approach for approximating misclassification probabilities in Euclidean distance classifier when the group means exhibit a bilinear structure such as in the growth curve model first proposed by Potthoff and Roy (Biometrika 51:313–326, 1964). Initially, by leveraging certain statistical relationships, we establish two general results for the improved Euclidean discriminant function in both weighted and unweighted growth curve mean structures. We derive these approximations for the expected misclassification probabilities with respect to the distribution of the improved Euclidean discriminant function. Additionally, we compare the misclassification probabilities of the improved Euclidean discriminant function, the standard Euclidean discriminant function, and the linear discriminant function. It is important to note that in cases where the mean structure is weighted, a higher number of repeated measurements yields better classification results with the improved Euclidean discriminant function and the standard Euclidean discriminant function, allowing for more information to be acquired, as opposed to the linear discriminant function, which performs well with a smaller number of repeated measurements. Furthermore, we evaluate the accuracy of the suggested approximations by Monte Carlo simulations.

本文介绍了一种新方法，用于近似欧氏距离分类器中的误分类概率，当群体均值呈现双线性结构时，例如 Potthoff 和 Roy 首次提出的增长曲线模型（Biometrika 51:313-326, 1964）。首先，通过利用某些统计关系，我们为加权和非加权增长曲线均值结构中的改进欧氏判别函数建立了两个一般结果。根据改进欧氏判别函数的分布，我们得出了这些预期误分类概率的近似值。此外，我们还比较了改进欧氏判别函数、标准欧氏判别函数和线性判别函数的误分类概率。值得注意的是，在平均结构加权的情况下，重复测量次数越多，改进欧氏判别函数和标准欧氏判别函数的分类结果就越好，这样可以获得更多的信息，而线性判别函数在重复测量次数较少的情况下表现较好。此外，我们还通过蒙特卡罗模拟评估了建议近似值的准确性。

{"title":"Classification of repeated measurements using bias corrected Euclidean distance discriminant function","authors":"Edward Kanuti Ngailo, Saralees Nadarajah","doi":"10.1007/s42952-023-00246-z","DOIUrl":"https://doi.org/10.1007/s42952-023-00246-z","url":null,"abstract":"<p>This paper introduces a novel approach for approximating misclassification probabilities in Euclidean distance classifier when the group means exhibit a bilinear structure such as in the growth curve model first proposed by Potthoff and Roy (Biometrika 51:313–326, 1964). Initially, by leveraging certain statistical relationships, we establish two general results for the improved Euclidean discriminant function in both weighted and unweighted growth curve mean structures. We derive these approximations for the expected misclassification probabilities with respect to the distribution of the improved Euclidean discriminant function. Additionally, we compare the misclassification probabilities of the improved Euclidean discriminant function, the standard Euclidean discriminant function, and the linear discriminant function. It is important to note that in cases where the mean structure is weighted, a higher number of repeated measurements yields better classification results with the improved Euclidean discriminant function and the standard Euclidean discriminant function, allowing for more information to be acquired, as opposed to the linear discriminant function, which performs well with a smaller number of repeated measurements. Furthermore, we evaluate the accuracy of the suggested approximations by Monte Carlo simulations.</p>","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"13 1","pages":""},"PeriodicalIF":0.6,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138575049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sparse functional linear models via calibrated concave-convex procedure 稀疏函数线性模型通过校准凹-凸程序

IF 0.6 4区数学 Q4 STATISTICS & PROBABILITY

Journal of the Korean Statistical Society

Pub Date : 2023-12-03 DOI: 10.1007/s42952-023-00242-3

Young Joo Lee, Yongho Jeon

In this paper, we propose a calibrated ConCave-Convex Procedure (CCCP) for variable selection in high-dimensional functional linear models. The calibrated CCCP approach for the Smoothly Clipped Absolute Deviation (SCAD) penalty is known to produce a consistent solution path with probability converging to one in linear models. We incorporate the SCAD penalty into function-on-scalar regression models and phrase them as a type of group-penalized estimation using a basis expansion approach. We then implement the calibrated CCCP method to solve the nonconvex group-penalized problem. For the tuning procedure, we use the Extended Bayesian Information Criterion (EBIC) to ensure consistency in high-dimensional settings. In simulation studies, we compare the performance of the proposed method with two existing convex-penalized estimators in terms of variable selection consistency and prediction accuracy. Lastly, we apply the method to the gene expression dataset for sparsely estimating the time-varying effects of transcription factors on the regulation of yeast cell cycle genes.

本文提出了一种用于高维函数线性模型中变量选择的校准凹-凸过程(CCCP)。对于平滑剪切绝对偏差(SCAD)惩罚，已知校准的CCCP方法可以产生线性模型中概率收敛为1的一致解路径。我们将SCAD惩罚合并到标量函数回归模型中，并使用基展开方法将它们作为一种组惩罚估计。然后，我们实现了校正后的CCCP方法来解决非凸群惩罚问题。对于调优过程，我们使用扩展贝叶斯信息准则(EBIC)来确保高维设置中的一致性。在仿真研究中，我们将该方法与现有的两种凸惩罚估计方法在变量选择一致性和预测精度方面进行了比较。最后，我们将该方法应用于基因表达数据集，以稀疏估计转录因子对酵母细胞周期基因调控的时变效应。

引用次数: 0

Nonparametric longitudinal regression model to analyze shape data using the Procrustes rotation 利用非参数纵向回归模型分析形状数据的Procrustes旋转

IF 0.6 4区数学 Q4 STATISTICS & PROBABILITY

Journal of the Korean Statistical Society

Pub Date : 2023-12-03 DOI: 10.1007/s42952-023-00241-4

Meisam Moghimbeygi, Mousa Golalizadeh

Shape, as an intrinsic concept, can be considered as a source of information in some statistical analysis contexts. For instance, one of the important topics in morphology is to study the shape changes along time. From a topological viewpoint, shape data are points on a particular manifold and so to construct a longitudinal model for treating shape variation is not as trivial as thought. Unlike using the common parametric models to do such a task, we invoke Procrustes analysis in the context of a nonparametric framework and propose a simple, yet useful, model to deal with shape changes. After conveying the problem into the nonparametric regression model, we utilize the weighted least squares method to estimates the related parameters. Also, we illustrate implementing this new model in simulation studies and analyzing two biological data sets. Our proposed model shows its superiority while compared with other counterpart models.

形状作为一个内在概念，在某些统计分析环境中可以被视为信息来源。例如，形态学的一个重要课题是研究形状随时间的变化。从拓扑学的角度来看，形状数据是一个特定流形上的点，因此建立一个纵向模型来处理形状变化并不像想象的那么简单。与使用普通参数模型来完成这样的任务不同，我们在非参数框架的背景下调用Procrustes分析，并提出一个简单但有用的模型来处理形状变化。将问题转化为非参数回归模型后，利用加权最小二乘法对相关参数进行估计。此外，我们说明了在模拟研究和分析两个生物数据集中实现这个新模型。与其他模型相比，我们所提出的模型显示出其优越性。

引用次数: 0

Variable selection for semiparametric accelerated failure time models with nonignorable missing data 具有不可忽略缺失数据的半参数加速失效时间模型的变量选择

IF 0.6 4区数学 Q4 STATISTICS & PROBABILITY

Journal of the Korean Statistical Society

Pub Date : 2023-11-19 DOI: 10.1007/s42952-023-00238-z

Tianqing Liu, Xiaohui Yuan, Liuquan Sun

The regularization approach for variable selection was well developed for semiparametric accelerated failure time (AFT) models, where the response variable is right censored. In the presence of missing data, this approach needs to be tailored to different missing data mechanisms. In this paper, we propose a flexible and generally applicable missing data mechanism for AFT models, which contains both ignorable and nonignorable missing data mechanism assumptions. We propose weighted rank (WR) estimators and corresponding penalized estimators of regression parameters under this missing data mechanism. An advantage of the WR estimators and corresponding penalized estimators is that they do not require specifying a missing data model for the proposed missing data mechanism. The theoretical properties of the WR and corresponding penalized estimators are established. Comprehensive simulation studies and a real data application further demonstrate the merits of our approach.

针对半参数加速失效时间(AFT)模型，提出了一种正则化的变量选择方法。在存在缺失数据的情况下，这种方法需要针对不同的缺失数据机制进行调整。在本文中，我们提出了一种灵活且普遍适用的AFT模型缺失数据机制，该机制包含可忽略和不可忽略的缺失数据机制假设。在这种缺失数据机制下，我们提出了加权秩估计和相应的惩罚估计。WR估计器和相应的惩罚估计器的一个优点是，它们不需要为提议的缺失数据机制指定缺失数据模型。建立了WR的理论性质和相应的惩罚估计量。综合仿真研究和实际数据应用进一步证明了该方法的优点。

引用次数: 0

Robust and Efficient derivative estimation under correlated errors 相关误差下稳健高效的导数估计

IF 0.6 4区数学 Q4 STATISTICS & PROBABILITY

Journal of the Korean Statistical Society

Pub Date : 2023-11-18 DOI: 10.1007/s42952-023-00240-5

Deru Kong, Wei Shen, Shengli Zhao, WenWu Wang

In real applications, the correlated data are commonly encountered. To model such data, many techniques have been proposed. However, of the developed techniques, emphasis has been on the mean function estimation under correlated errors, with scant attention paid to the derivative estimation. In this paper, we propose the locally weighted least squares regression based on different difference quotients to estimate the different order derivatives under correlated errors. For the proposed estimators, we derive their asymptotic bias and variance with different covariance structure errors, which dramatically reduce the estimation variance compared with traditional methods. Furthermore, we establish their asymptotic normality for constructing confidence interval. Based on the asymptotic mean integrated squared error, we provide a data-driven tuning parameters selection criterion. Simulation studies show that the proposed method is more robust and efficient than four other popular methods. Finally, we illustrate the usefulness of the proposed method with a real data example.

在实际应用中，经常会遇到相关数据。为了对这些数据建模，已经提出了许多技术。然而，在现有的技术中，重点是在相关误差下的均值函数估计，而对导数估计的关注较少。本文提出了基于不同差商的局部加权最小二乘回归来估计相关误差下的不同阶导数。对于所提出的估计量，我们推导了具有不同协方差结构误差的估计量的渐近偏差和方差，与传统方法相比，显著减小了估计方差。进一步，我们建立了它们的渐近正态性，用于构造置信区间。基于渐近均值积分平方误差，给出了一种数据驱动的调谐参数选择准则。仿真研究表明，该方法比其他四种常用方法具有更好的鲁棒性和有效性。最后，通过一个实际数据示例说明了所提方法的有效性。

引用次数: 0

Asymptotic bias of the $$ell _2$$-regularized error variance estimator $$ell _2$$ -正则化误差方差估计量的渐近偏差