Pub Date : 2023-12-18DOI: 10.1007/s42952-023-00243-2
Su Jin Jeong, Hyo-jung Lee, Soong Deok Lee, Su Jeong Park, Seung Hwan Lee, Jae Won Lee
Genetic evidence, especially evidence based on short tandem repeats, is of paramount importance for human identification in forensic inferences. In recent years, the identification of kinship using DNA evidence has drawn much attention in various fields. In particular, it is employed, using a criminal database, to confirm blood relations in forensics. The interpretation of the likelihood ratio when identifying an individual or a relationship depends on the allele frequencies that are used, and thus, it is crucial to obtain an accurate estimate of allele frequency. Each organization such as Supreme Prosecutors’ Office and Korean National Police Agency in Korea provides different statistical interpretations due to differing estimations of the allele frequency, which can lead to confusion in forensic identification. Therefore, it is very important to estimate allele frequency accurately, and doing so requires a certain amount of information. However, simply using a weighted average for each allele frequency may not be sufficient to determine biological independence. In this study, we propose a new statistical method for estimating allele frequency by integrating the data obtained from several organizations, and we analyze biological independence and differences in allele frequency relative to the weighted average of allele frequencies in various subgroups. Finally, our proposed method is illustrated using real data from 576 Korean individuals.
基因证据,尤其是基于短串联重复序列的证据,对于法医推断中的人类身份识别至关重要。近年来,利用 DNA 证据进行亲属关系鉴定在各个领域引起了广泛关注。特别是在法医学中,人们利用犯罪数据库来确认血缘关系。在确认个人或亲属关系时,对似然比的解释取决于所使用的等位基因频率,因此,准确估计等位基因频率至关重要。由于对等位基因频率的估计不同,韩国最高检察院和韩国国家警察厅等每个机构都提供了不同的统计解释,这可能导致法医鉴定中的混乱。因此,准确估算等位基因频率非常重要,而这样做需要一定量的信息。然而,仅仅使用每个等位基因频率的加权平均值可能不足以确定生物独立性。在本研究中,我们提出了一种新的统计方法,通过整合从多个机构获得的数据来估算等位基因频率,并分析了生物独立性以及相对于不同亚组等位基因频率加权平均值的等位基因频率差异。最后,我们使用 576 个韩国个体的真实数据对我们提出的方法进行了说明。
{"title":"Statistical integration of allele frequencies from several organizations","authors":"Su Jin Jeong, Hyo-jung Lee, Soong Deok Lee, Su Jeong Park, Seung Hwan Lee, Jae Won Lee","doi":"10.1007/s42952-023-00243-2","DOIUrl":"https://doi.org/10.1007/s42952-023-00243-2","url":null,"abstract":"<p>Genetic evidence, especially evidence based on short tandem repeats, is of paramount importance for human identification in forensic inferences. In recent years, the identification of kinship using DNA evidence has drawn much attention in various fields. In particular, it is employed, using a criminal database, to confirm blood relations in forensics. The interpretation of the likelihood ratio when identifying an individual or a relationship depends on the allele frequencies that are used, and thus, it is crucial to obtain an accurate estimate of allele frequency. Each organization such as Supreme Prosecutors’ Office and Korean National Police Agency in Korea provides different statistical interpretations due to differing estimations of the allele frequency, which can lead to confusion in forensic identification. Therefore, it is very important to estimate allele frequency accurately, and doing so requires a certain amount of information. However, simply using a weighted average for each allele frequency may not be sufficient to determine biological independence. In this study, we propose a new statistical method for estimating allele frequency by integrating the data obtained from several organizations, and we analyze biological independence and differences in allele frequency relative to the weighted average of allele frequencies in various subgroups. Finally, our proposed method is illustrated using real data from 576 Korean individuals.</p>","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"21 1","pages":""},"PeriodicalIF":0.6,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138717017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-12DOI: 10.1007/s42952-023-00246-z
Edward Kanuti Ngailo, Saralees Nadarajah
This paper introduces a novel approach for approximating misclassification probabilities in Euclidean distance classifier when the group means exhibit a bilinear structure such as in the growth curve model first proposed by Potthoff and Roy (Biometrika 51:313–326, 1964). Initially, by leveraging certain statistical relationships, we establish two general results for the improved Euclidean discriminant function in both weighted and unweighted growth curve mean structures. We derive these approximations for the expected misclassification probabilities with respect to the distribution of the improved Euclidean discriminant function. Additionally, we compare the misclassification probabilities of the improved Euclidean discriminant function, the standard Euclidean discriminant function, and the linear discriminant function. It is important to note that in cases where the mean structure is weighted, a higher number of repeated measurements yields better classification results with the improved Euclidean discriminant function and the standard Euclidean discriminant function, allowing for more information to be acquired, as opposed to the linear discriminant function, which performs well with a smaller number of repeated measurements. Furthermore, we evaluate the accuracy of the suggested approximations by Monte Carlo simulations.
本文介绍了一种新方法,用于近似欧氏距离分类器中的误分类概率,当群体均值呈现双线性结构时,例如 Potthoff 和 Roy 首次提出的增长曲线模型(Biometrika 51:313-326, 1964)。首先,通过利用某些统计关系,我们为加权和非加权增长曲线均值结构中的改进欧氏判别函数建立了两个一般结果。根据改进欧氏判别函数的分布,我们得出了这些预期误分类概率的近似值。此外,我们还比较了改进欧氏判别函数、标准欧氏判别函数和线性判别函数的误分类概率。值得注意的是,在平均结构加权的情况下,重复测量次数越多,改进欧氏判别函数和标准欧氏判别函数的分类结果就越好,这样可以获得更多的信息,而线性判别函数在重复测量次数较少的情况下表现较好。此外,我们还通过蒙特卡罗模拟评估了建议近似值的准确性。
{"title":"Classification of repeated measurements using bias corrected Euclidean distance discriminant function","authors":"Edward Kanuti Ngailo, Saralees Nadarajah","doi":"10.1007/s42952-023-00246-z","DOIUrl":"https://doi.org/10.1007/s42952-023-00246-z","url":null,"abstract":"<p>This paper introduces a novel approach for approximating misclassification probabilities in Euclidean distance classifier when the group means exhibit a bilinear structure such as in the growth curve model first proposed by Potthoff and Roy (Biometrika 51:313–326, 1964). Initially, by leveraging certain statistical relationships, we establish two general results for the improved Euclidean discriminant function in both weighted and unweighted growth curve mean structures. We derive these approximations for the expected misclassification probabilities with respect to the distribution of the improved Euclidean discriminant function. Additionally, we compare the misclassification probabilities of the improved Euclidean discriminant function, the standard Euclidean discriminant function, and the linear discriminant function. It is important to note that in cases where the mean structure is weighted, a higher number of repeated measurements yields better classification results with the improved Euclidean discriminant function and the standard Euclidean discriminant function, allowing for more information to be acquired, as opposed to the linear discriminant function, which performs well with a smaller number of repeated measurements. Furthermore, we evaluate the accuracy of the suggested approximations by Monte Carlo simulations.</p>","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"13 1","pages":""},"PeriodicalIF":0.6,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138575049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-03DOI: 10.1007/s42952-023-00242-3
Young Joo Lee, Yongho Jeon
In this paper, we propose a calibrated ConCave-Convex Procedure (CCCP) for variable selection in high-dimensional functional linear models. The calibrated CCCP approach for the Smoothly Clipped Absolute Deviation (SCAD) penalty is known to produce a consistent solution path with probability converging to one in linear models. We incorporate the SCAD penalty into function-on-scalar regression models and phrase them as a type of group-penalized estimation using a basis expansion approach. We then implement the calibrated CCCP method to solve the nonconvex group-penalized problem. For the tuning procedure, we use the Extended Bayesian Information Criterion (EBIC) to ensure consistency in high-dimensional settings. In simulation studies, we compare the performance of the proposed method with two existing convex-penalized estimators in terms of variable selection consistency and prediction accuracy. Lastly, we apply the method to the gene expression dataset for sparsely estimating the time-varying effects of transcription factors on the regulation of yeast cell cycle genes.
{"title":"Sparse functional linear models via calibrated concave-convex procedure","authors":"Young Joo Lee, Yongho Jeon","doi":"10.1007/s42952-023-00242-3","DOIUrl":"https://doi.org/10.1007/s42952-023-00242-3","url":null,"abstract":"<p>In this paper, we propose a calibrated ConCave-Convex Procedure (CCCP) for variable selection in high-dimensional functional linear models. The calibrated CCCP approach for the Smoothly Clipped Absolute Deviation (SCAD) penalty is known to produce a consistent solution path with probability converging to one in linear models. We incorporate the SCAD penalty into function-on-scalar regression models and phrase them as a type of group-penalized estimation using a basis expansion approach. We then implement the calibrated CCCP method to solve the nonconvex group-penalized problem. For the tuning procedure, we use the Extended Bayesian Information Criterion (EBIC) to ensure consistency in high-dimensional settings. In simulation studies, we compare the performance of the proposed method with two existing convex-penalized estimators in terms of variable selection consistency and prediction accuracy. Lastly, we apply the method to the gene expression dataset for sparsely estimating the time-varying effects of transcription factors on the regulation of yeast cell cycle genes.</p>","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"25 7","pages":""},"PeriodicalIF":0.6,"publicationDate":"2023-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138496033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-03DOI: 10.1007/s42952-023-00241-4
Meisam Moghimbeygi, Mousa Golalizadeh
Shape, as an intrinsic concept, can be considered as a source of information in some statistical analysis contexts. For instance, one of the important topics in morphology is to study the shape changes along time. From a topological viewpoint, shape data are points on a particular manifold and so to construct a longitudinal model for treating shape variation is not as trivial as thought. Unlike using the common parametric models to do such a task, we invoke Procrustes analysis in the context of a nonparametric framework and propose a simple, yet useful, model to deal with shape changes. After conveying the problem into the nonparametric regression model, we utilize the weighted least squares method to estimates the related parameters. Also, we illustrate implementing this new model in simulation studies and analyzing two biological data sets. Our proposed model shows its superiority while compared with other counterpart models.
{"title":"Nonparametric longitudinal regression model to analyze shape data using the Procrustes rotation","authors":"Meisam Moghimbeygi, Mousa Golalizadeh","doi":"10.1007/s42952-023-00241-4","DOIUrl":"https://doi.org/10.1007/s42952-023-00241-4","url":null,"abstract":"<p>Shape, as an intrinsic concept, can be considered as a source of information in some statistical analysis contexts. For instance, one of the important topics in morphology is to study the shape changes along time. From a topological viewpoint, shape data are points on a particular manifold and so to construct a longitudinal model for treating shape variation is not as trivial as thought. Unlike using the common parametric models to do such a task, we invoke Procrustes analysis in the context of a nonparametric framework and propose a simple, yet useful, model to deal with shape changes. After conveying the problem into the nonparametric regression model, we utilize the weighted least squares method to estimates the related parameters. Also, we illustrate implementing this new model in simulation studies and analyzing two biological data sets. Our proposed model shows its superiority while compared with other counterpart models.</p>","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"25 6","pages":""},"PeriodicalIF":0.6,"publicationDate":"2023-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138496034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-19DOI: 10.1007/s42952-023-00238-z
Tianqing Liu, Xiaohui Yuan, Liuquan Sun
The regularization approach for variable selection was well developed for semiparametric accelerated failure time (AFT) models, where the response variable is right censored. In the presence of missing data, this approach needs to be tailored to different missing data mechanisms. In this paper, we propose a flexible and generally applicable missing data mechanism for AFT models, which contains both ignorable and nonignorable missing data mechanism assumptions. We propose weighted rank (WR) estimators and corresponding penalized estimators of regression parameters under this missing data mechanism. An advantage of the WR estimators and corresponding penalized estimators is that they do not require specifying a missing data model for the proposed missing data mechanism. The theoretical properties of the WR and corresponding penalized estimators are established. Comprehensive simulation studies and a real data application further demonstrate the merits of our approach.
{"title":"Variable selection for semiparametric accelerated failure time models with nonignorable missing data","authors":"Tianqing Liu, Xiaohui Yuan, Liuquan Sun","doi":"10.1007/s42952-023-00238-z","DOIUrl":"https://doi.org/10.1007/s42952-023-00238-z","url":null,"abstract":"<p>The regularization approach for variable selection was well developed for semiparametric accelerated failure time (AFT) models, where the response variable is right censored. In the presence of missing data, this approach needs to be tailored to different missing data mechanisms. In this paper, we propose a flexible and generally applicable missing data mechanism for AFT models, which contains both ignorable and nonignorable missing data mechanism assumptions. We propose weighted rank (WR) estimators and corresponding penalized estimators of regression parameters under this missing data mechanism. An advantage of the WR estimators and corresponding penalized estimators is that they do not require specifying a missing data model for the proposed missing data mechanism. The theoretical properties of the WR and corresponding penalized estimators are established. Comprehensive simulation studies and a real data application further demonstrate the merits of our approach.</p>","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"26 1","pages":""},"PeriodicalIF":0.6,"publicationDate":"2023-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138496032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-18DOI: 10.1007/s42952-023-00240-5
Deru Kong, Wei Shen, Shengli Zhao, WenWu Wang
In real applications, the correlated data are commonly encountered. To model such data, many techniques have been proposed. However, of the developed techniques, emphasis has been on the mean function estimation under correlated errors, with scant attention paid to the derivative estimation. In this paper, we propose the locally weighted least squares regression based on different difference quotients to estimate the different order derivatives under correlated errors. For the proposed estimators, we derive their asymptotic bias and variance with different covariance structure errors, which dramatically reduce the estimation variance compared with traditional methods. Furthermore, we establish their asymptotic normality for constructing confidence interval. Based on the asymptotic mean integrated squared error, we provide a data-driven tuning parameters selection criterion. Simulation studies show that the proposed method is more robust and efficient than four other popular methods. Finally, we illustrate the usefulness of the proposed method with a real data example.
{"title":"Robust and Efficient derivative estimation under correlated errors","authors":"Deru Kong, Wei Shen, Shengli Zhao, WenWu Wang","doi":"10.1007/s42952-023-00240-5","DOIUrl":"https://doi.org/10.1007/s42952-023-00240-5","url":null,"abstract":"<p>In real applications, the correlated data are commonly encountered. To model such data, many techniques have been proposed. However, of the developed techniques, emphasis has been on the mean function estimation under correlated errors, with scant attention paid to the derivative estimation. In this paper, we propose the locally weighted least squares regression based on different difference quotients to estimate the different order derivatives under correlated errors. For the proposed estimators, we derive their asymptotic bias and variance with different covariance structure errors, which dramatically reduce the estimation variance compared with traditional methods. Furthermore, we establish their asymptotic normality for constructing confidence interval. Based on the asymptotic mean integrated squared error, we provide a data-driven tuning parameters selection criterion. Simulation studies show that the proposed method is more robust and efficient than four other popular methods. Finally, we illustrate the usefulness of the proposed method with a real data example.</p>","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"26 3","pages":""},"PeriodicalIF":0.6,"publicationDate":"2023-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138496031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-14DOI: 10.1007/s42952-023-00239-y
Semin Choi, Gunwoong Park
{"title":"Asymptotic bias of the $$ell _2$$-regularized error variance estimator","authors":"Semin Choi, Gunwoong Park","doi":"10.1007/s42952-023-00239-y","DOIUrl":"https://doi.org/10.1007/s42952-023-00239-y","url":null,"abstract":"","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"12 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134954534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-13DOI: 10.1007/s42952-023-00235-2
Rohan D. Koshti, Kirtee K. Kamalja
{"title":"A review on concomitants of order statistics and its application in parameter estimation under ranked set sampling","authors":"Rohan D. Koshti, Kirtee K. Kamalja","doi":"10.1007/s42952-023-00235-2","DOIUrl":"https://doi.org/10.1007/s42952-023-00235-2","url":null,"abstract":"","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"63 31","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136282166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-03DOI: 10.1007/s42952-023-00232-5
A. M. Elsawah
{"title":"A novel doubling-tripling-threshold accepting hybrid algorithm for constructing asymmetric space-filling designs","authors":"A. M. Elsawah","doi":"10.1007/s42952-023-00232-5","DOIUrl":"https://doi.org/10.1007/s42952-023-00232-5","url":null,"abstract":"","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"11 34","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135818350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-25DOI: 10.1007/s42952-023-00234-3
Xiaohui Yuan, Yue Wang, Yiming Wang, Tianqing Liu
{"title":"Variable selection for single-index models based on martingale difference divergence","authors":"Xiaohui Yuan, Yue Wang, Yiming Wang, Tianqing Liu","doi":"10.1007/s42952-023-00234-3","DOIUrl":"https://doi.org/10.1007/s42952-023-00234-3","url":null,"abstract":"","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134973791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}