Pub Date : 2022-10-02DOI: 10.1080/23737484.2022.2133028
Jatesh Kumar, Parminder Singh, A. N. Gill
Abstract In this paper, stepwise multiple testing procedures are proposed for comparing successive populations in a sequence of several independent normal populations using index parameter variance. The proposed procedures have advantages over the single-step procedures and closed testing procedures available in the existing literature. The proposed stepwise testing procedures, control the family-wise error rate (FWER) strongly, and dramatically improve in power over the relevant single-step procedures. The closed testing procedure, which is step-down in nature and is developed for a testing problem, is very complex in its implementation, and this complexity increases further as the number of successive comparisons increases. The relevant critical constants have been tabulated to facilitate the implementation of the proposed procedures. We also proposed testing procedures for comparing the successive populations in a sequence of two-parametric exponential populations with regards to their scale parameters. In an effort to discern the efficiency of proposed procedures, simulated power comparisons with relevant existing procedures are presented, and the working of the proposed procedures is exemplified by means of two real-life data sets.
{"title":"Stepwise multiple testing procedures for the successive comparison of variances","authors":"Jatesh Kumar, Parminder Singh, A. N. Gill","doi":"10.1080/23737484.2022.2133028","DOIUrl":"https://doi.org/10.1080/23737484.2022.2133028","url":null,"abstract":"Abstract In this paper, stepwise multiple testing procedures are proposed for comparing successive populations in a sequence of several independent normal populations using index parameter variance. The proposed procedures have advantages over the single-step procedures and closed testing procedures available in the existing literature. The proposed stepwise testing procedures, control the family-wise error rate (FWER) strongly, and dramatically improve in power over the relevant single-step procedures. The closed testing procedure, which is step-down in nature and is developed for a testing problem, is very complex in its implementation, and this complexity increases further as the number of successive comparisons increases. The relevant critical constants have been tabulated to facilitate the implementation of the proposed procedures. We also proposed testing procedures for comparing the successive populations in a sequence of two-parametric exponential populations with regards to their scale parameters. In an effort to discern the efficiency of proposed procedures, simulated power comparisons with relevant existing procedures are presented, and the working of the proposed procedures is exemplified by means of two real-life data sets.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"61 1","pages":"649 - 662"},"PeriodicalIF":0.0,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91056322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-02DOI: 10.1080/23737484.2022.2133027
Dhruba Das, H. Saikia, Dibyojyoti Bhattacharjee
ABSTRACT Every player is expected to contribute to team’s batting effort irrespective of their batting position. The expected contribution of each batsman is to score runs as quickly as possible without getting dismissed. It has often seen that based on different situations within a game, a batsman must play either carefully to defend his wicket or strike out aggressively to score runs quickly. Based on the expertise of the batsman, the captain of the fielding team arrange the fielders in such a way that the batsman couldn’t maximize his score. However, the arrangement of fielders is also dependent on the type of bowler (spin/fast) as well as team’s bowling strategies. Therefore, this study tries to find out the optimal playing strategies of a batsman on the field against different bowling types through the approach of game theory. To substantiate the model with live data a batsman’s strategies against different types of bowlers are explained in this work.
{"title":"Optimal playing strategies of a batsman against bowling type in limited-over cricket: An application of game theory","authors":"Dhruba Das, H. Saikia, Dibyojyoti Bhattacharjee","doi":"10.1080/23737484.2022.2133027","DOIUrl":"https://doi.org/10.1080/23737484.2022.2133027","url":null,"abstract":"ABSTRACT Every player is expected to contribute to team’s batting effort irrespective of their batting position. The expected contribution of each batsman is to score runs as quickly as possible without getting dismissed. It has often seen that based on different situations within a game, a batsman must play either carefully to defend his wicket or strike out aggressively to score runs quickly. Based on the expertise of the batsman, the captain of the fielding team arrange the fielders in such a way that the batsman couldn’t maximize his score. However, the arrangement of fielders is also dependent on the type of bowler (spin/fast) as well as team’s bowling strategies. Therefore, this study tries to find out the optimal playing strategies of a batsman on the field against different bowling types through the approach of game theory. To substantiate the model with live data a batsman’s strategies against different types of bowlers are explained in this work.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"106 1","pages":"738 - 751"},"PeriodicalIF":0.0,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86430009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-02DOI: 10.1080/23737484.2022.2133026
Shehu Bala, Usman Abubakar Umar
Abstract This study presents the extension of generalized Poisson (GP-1 and GP-2) models for three-way contingency table. We assume a mixed systematic component of the log-linear models for contingency tables to produce a linear transformation for the link function of Generalized Linear Models (GLMs). Maximum likelihood estimation method was derived for the parameters estimates of the models. An over-dispersed malaria data of 2019 was considered for the study. The GP-1 and GP-2 models for three-way contingency table was used to model the data. Based on Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC) goodness-of-fits measures, the GP-2 model outperformed the GP-1 model for three-way contingency table on malaria data. We found that some parameters of the full model were statistically significant as; malaria cases was sensitive to all ages considered in the study, and people were more infected with malaria in the month of April, June, and July 2019.
摘要研究了三向列联表的广义泊松(GP-1和GP-2)模型的推广。我们假设列联表的对数线性模型的混合系统成分,以产生广义线性模型(GLMs)的链接函数的线性变换。导出了模型参数估计的极大似然估计方法。该研究考虑了2019年过度分散的疟疾数据。采用三元列联表的GP-1和GP-2模型对数据进行建模。基于赤池信息准则(Akaike Information Criterion, AIC)和贝叶斯信息准则(Bayesian Information Criterion, BIC)拟合优度度量,GP-2模型在疟疾数据的三元列联表上优于GP-1模型。我们发现整个模型的一些参数在统计学上显著为;疟疾病例对研究中考虑的所有年龄段都敏感,2019年4月、6月和7月的人群感染疟疾较多。
{"title":"Extension of generalized Poisson log-linear regression models for analysing three-way contingency table: Application to malaria data","authors":"Shehu Bala, Usman Abubakar Umar","doi":"10.1080/23737484.2022.2133026","DOIUrl":"https://doi.org/10.1080/23737484.2022.2133026","url":null,"abstract":"Abstract This study presents the extension of generalized Poisson (GP-1 and GP-2) models for three-way contingency table. We assume a mixed systematic component of the log-linear models for contingency tables to produce a linear transformation for the link function of Generalized Linear Models (GLMs). Maximum likelihood estimation method was derived for the parameters estimates of the models. An over-dispersed malaria data of 2019 was considered for the study. The GP-1 and GP-2 models for three-way contingency table was used to model the data. Based on Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC) goodness-of-fits measures, the GP-2 model outperformed the GP-1 model for three-way contingency table on malaria data. We found that some parameters of the full model were statistically significant as; malaria cases was sensitive to all ages considered in the study, and people were more infected with malaria in the month of April, June, and July 2019.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"24 1","pages":"634 - 648"},"PeriodicalIF":0.0,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82572467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-02DOI: 10.1080/23737484.2022.2139019
Suay Erees
Abstract Dichotomizing continuous outcome variables is a common procedure in medical sciences. When analyzing these variables using binary logistic regression, great attention should be paid to the choice of the measure of explained variation ( . Since there are many different R 2 in logistic regression, in order to make correct inferences about models, evaluating their performances has become more important. The purpose of this paper is to reveal asymptotically more efficient and reliable R 2 measure when analyzing the models with dichotomized outcome. The eight most recommended R 2 statistics and ordinary least squares R 2 associated with the underlying continuous outcome have been included. Their asymptotic distributions have been studied. They have also been compared under varying correlational conditions between outcome and covariate. Extensive simulations using the bootstrap method have been conducted under two modeling scenarios. A real data example is also presented. The findings provide support and important basis for making efficient decisions.
{"title":"Effects of dichotomizing continuous outcome on efficiencies of measures of explained variation in logistic regression: Simulation study and application","authors":"Suay Erees","doi":"10.1080/23737484.2022.2139019","DOIUrl":"https://doi.org/10.1080/23737484.2022.2139019","url":null,"abstract":"Abstract Dichotomizing continuous outcome variables is a common procedure in medical sciences. When analyzing these variables using binary logistic regression, great attention should be paid to the choice of the measure of explained variation ( . Since there are many different R 2 in logistic regression, in order to make correct inferences about models, evaluating their performances has become more important. The purpose of this paper is to reveal asymptotically more efficient and reliable R 2 measure when analyzing the models with dichotomized outcome. The eight most recommended R 2 statistics and ordinary least squares R 2 associated with the underlying continuous outcome have been included. Their asymptotic distributions have been studied. They have also been compared under varying correlational conditions between outcome and covariate. Extensive simulations using the bootstrap method have been conducted under two modeling scenarios. A real data example is also presented. The findings provide support and important basis for making efficient decisions.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"9 1","pages":"663 - 681"},"PeriodicalIF":0.0,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84257204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-26DOI: 10.1080/23737484.2022.2126413
A. Salmasnia, Ehsan Emamjomeh, M. Maleki
ABSTRACT In modern industries, statistical process monitoring (SPM) and maintenance management are extensively employed to increase the production rate of conforming items. Aiming at minimizing the expected total cost per time unit subject to some statistical constraints, this study proposes a hybrid quality-maintenance model for imperfect short-run process. To bring the proposed model closer to real short-run systems, it is considered that the process mean may shift to an out-of-control condition due to occurrence of several types of assignable causes. Furthermore, it is supposed that the time-to-failure follows a non-homogenous Poisson process implying that the system may suddenly fail with an increasing failure rate function. Moreover, a non-uniform sampling scheme is developed in order to improve the system reliability. Finally, the main advantages of the proposed model are highlighted by conducting two comparative studies. The first one illustrates the efficiency of the non-uniform sampling scheme in increasing the in-control time interval and decreasing the expected total cost. The second one confirms the importance of considering the system failure on both the expected total cost and in-control time interval.
{"title":"Joint design of control chart and maintenance policy under multiple assignable causes and random failures by considering the statistical constraints","authors":"A. Salmasnia, Ehsan Emamjomeh, M. Maleki","doi":"10.1080/23737484.2022.2126413","DOIUrl":"https://doi.org/10.1080/23737484.2022.2126413","url":null,"abstract":"ABSTRACT In modern industries, statistical process monitoring (SPM) and maintenance management are extensively employed to increase the production rate of conforming items. Aiming at minimizing the expected total cost per time unit subject to some statistical constraints, this study proposes a hybrid quality-maintenance model for imperfect short-run process. To bring the proposed model closer to real short-run systems, it is considered that the process mean may shift to an out-of-control condition due to occurrence of several types of assignable causes. Furthermore, it is supposed that the time-to-failure follows a non-homogenous Poisson process implying that the system may suddenly fail with an increasing failure rate function. Moreover, a non-uniform sampling scheme is developed in order to improve the system reliability. Finally, the main advantages of the proposed model are highlighted by conducting two comparative studies. The first one illustrates the efficiency of the non-uniform sampling scheme in increasing the in-control time interval and decreasing the expected total cost. The second one confirms the importance of considering the system failure on both the expected total cost and in-control time interval.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"13 1","pages":"607 - 633"},"PeriodicalIF":0.0,"publicationDate":"2022-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86080797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-15DOI: 10.1080/23737484.2022.2121947
Arunima S. Kannan, R. V. Vardhan
Abstract Receiver operating characteristic (ROC) curve is one of the well-known classification tools. There are several bi-distributional ROC models in the literature, which can be applied only when there is a prior knowledge on the class/status of the subject. If the predefined status of the subject is not known, then we need to administer a statistical methodology to identify the homogeneous components within it. Once this is done, modeling of ROC can be made, and here it is assumed that the data underlie non-normal distribution. In this paper, the need for handling non-normal data in the framework of mixture model is discussed and demonstrated using a real data set and simulation studies. It is shown that, the proposed mixGamma ROC model replaces the existing ROC models when the data is of non-normal and multi-mode.
{"title":"Estimation of area under the ROC curve in the framework of gamma mixtures","authors":"Arunima S. Kannan, R. V. Vardhan","doi":"10.1080/23737484.2022.2121947","DOIUrl":"https://doi.org/10.1080/23737484.2022.2121947","url":null,"abstract":"Abstract Receiver operating characteristic (ROC) curve is one of the well-known classification tools. There are several bi-distributional ROC models in the literature, which can be applied only when there is a prior knowledge on the class/status of the subject. If the predefined status of the subject is not known, then we need to administer a statistical methodology to identify the homogeneous components within it. Once this is done, modeling of ROC can be made, and here it is assumed that the data underlie non-normal distribution. In this paper, the need for handling non-normal data in the framework of mixture model is discussed and demonstrated using a real data set and simulation studies. It is shown that, the proposed mixGamma ROC model replaces the existing ROC models when the data is of non-normal and multi-mode.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"61 1","pages":"714 - 727"},"PeriodicalIF":0.0,"publicationDate":"2022-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89181524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-13DOI: 10.1080/23737484.2022.2117744
Daiane Chitko de Souza, C. Taconeli
Abstract Education is one of the pillars of human societies, such that achieving better indicators in this area is a common goal for different federate entities. In this context, identifying patterns on the results of such indicators, evaluated for different entities, as well as grouping them based on their similarities, can lead to a better understanding of the educational scenario of a population. This knowledge, moreover, might subsidize the formulation of public policies and allow the decision-making by the responsible managers. In the present work, we present an illustrative example of the application of spatial and non-spatial clustering algorithms in the analysis of data from six important indicators of basic education (middle and high school) evaluated for the municipalities of the state of Paraná, Brazil. Clusters provided by each method were evaluated according to their spatial distributions and educational features. The different clustering algorithms produced clusters with different levels of spatial contiguity and homogeneity regarding the educational indicators, reflecting the importance of choosing the appropriate clustering technique based on the research objectives.
{"title":"Spatial and non-spatial clustering algorithms in the analysis of Brazilian educational data","authors":"Daiane Chitko de Souza, C. Taconeli","doi":"10.1080/23737484.2022.2117744","DOIUrl":"https://doi.org/10.1080/23737484.2022.2117744","url":null,"abstract":"Abstract Education is one of the pillars of human societies, such that achieving better indicators in this area is a common goal for different federate entities. In this context, identifying patterns on the results of such indicators, evaluated for different entities, as well as grouping them based on their similarities, can lead to a better understanding of the educational scenario of a population. This knowledge, moreover, might subsidize the formulation of public policies and allow the decision-making by the responsible managers. In the present work, we present an illustrative example of the application of spatial and non-spatial clustering algorithms in the analysis of data from six important indicators of basic education (middle and high school) evaluated for the municipalities of the state of Paraná, Brazil. Clusters provided by each method were evaluated according to their spatial distributions and educational features. The different clustering algorithms produced clusters with different levels of spatial contiguity and homogeneity regarding the educational indicators, reflecting the importance of choosing the appropriate clustering technique based on the research objectives.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"1 1","pages":"588 - 606"},"PeriodicalIF":0.0,"publicationDate":"2022-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89321301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-12DOI: 10.1080/23737484.2022.2106324
Sahika Gokmen, J. Lyhagen
Abstract The random errors in the measurement process, called measurement error or misclassification, are inevitable and cause bias and inconsistent parameter estimates. Misclassification Simulation Extrapolation (MC-SIMEX) is a simulation based measurement error estimation method to obtain reduced parameter bias under misclassification. The main purpose of this study is an adaptation of MC-SIMEX method on Structural Equation Modeling (SEM). The effects of misclassification on the parameter estimates of a binary explanatory variables in SEM and the performance of MC-SIMEX method investigated with both Monte Carlo and an empirical study. According to the main results, finding the best extrapolant function is just as important as estimating the misclassification matrix although MC-SIMEX corrected a part of the bias.
{"title":"Parameter estimation of structural equation models with misclassification: The MC-SIMEX approach","authors":"Sahika Gokmen, J. Lyhagen","doi":"10.1080/23737484.2022.2106324","DOIUrl":"https://doi.org/10.1080/23737484.2022.2106324","url":null,"abstract":"Abstract The random errors in the measurement process, called measurement error or misclassification, are inevitable and cause bias and inconsistent parameter estimates. Misclassification Simulation Extrapolation (MC-SIMEX) is a simulation based measurement error estimation method to obtain reduced parameter bias under misclassification. The main purpose of this study is an adaptation of MC-SIMEX method on Structural Equation Modeling (SEM). The effects of misclassification on the parameter estimates of a binary explanatory variables in SEM and the performance of MC-SIMEX method investigated with both Monte Carlo and an empirical study. According to the main results, finding the best extrapolant function is just as important as estimating the misclassification matrix although MC-SIMEX corrected a part of the bias.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"1 1","pages":"545 - 558"},"PeriodicalIF":0.0,"publicationDate":"2022-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89812208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-08DOI: 10.1080/23737484.2022.2107962
Mahendra Saha
ABSTRACT The process capability indices (PCIs) are frequently adopted to measure the performance of a process within the specifications. Although higher PCIs indicate higher process “quality,” yet it does not ascertain fewer rates of rejection. Thus, it is more appropriate to adopt a loss-based PCI for measuring the process capability. In this paper, our first objective is to introduce a new capability index called which is based on symmetric loss function for normal process which provides a tailored way of incorporating the loss in capability analysis. Next, we estimate the PCI when the process follows the normal distribution using method of moment (MOM) estimation and compare the performance of the MOM estimation in terms of their absolute biases and corresponding mean squared errors through simulation study in respect of sample sizes. Besides, generalized confidence interval (GCI) is employed for constructing the confidence intervals for the index . The performance of GCI is compared in terms of average widths and coverage probabilities using Monte Carlo simulation. Finally, for illustrating the effectiveness of the proposed method of estimation and GCI, three real data sets from electronic industries are analyzed.
{"title":"Applications of a new process capability index to electronic industries","authors":"Mahendra Saha","doi":"10.1080/23737484.2022.2107962","DOIUrl":"https://doi.org/10.1080/23737484.2022.2107962","url":null,"abstract":"ABSTRACT The process capability indices (PCIs) are frequently adopted to measure the performance of a process within the specifications. Although higher PCIs indicate higher process “quality,” yet it does not ascertain fewer rates of rejection. Thus, it is more appropriate to adopt a loss-based PCI for measuring the process capability. In this paper, our first objective is to introduce a new capability index called which is based on symmetric loss function for normal process which provides a tailored way of incorporating the loss in capability analysis. Next, we estimate the PCI when the process follows the normal distribution using method of moment (MOM) estimation and compare the performance of the MOM estimation in terms of their absolute biases and corresponding mean squared errors through simulation study in respect of sample sizes. Besides, generalized confidence interval (GCI) is employed for constructing the confidence intervals for the index . The performance of GCI is compared in terms of average widths and coverage probabilities using Monte Carlo simulation. Finally, for illustrating the effectiveness of the proposed method of estimation and GCI, three real data sets from electronic industries are analyzed.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"104 1","pages":"574 - 587"},"PeriodicalIF":0.0,"publicationDate":"2022-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75930099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-08DOI: 10.1080/23737484.2022.2107961
D. Dai, Jianxin Pan, Yuli Liang
Abstract Estimating inverse covariance matrix is an essential part of many statistical methods. This paper proposes a regularized estimator for the inverse covariance matrix. Modified Cholesky decomposition (MCD) is utilized to construct positive definite estimators. Instead of directly regularizing the inverse covariance matrix itself, we impose regularization on the Cholesky factor. The estimated inverse covariance matrix is used to build Mahalanobis distance (MD). The proposed method is evaluated by detecting outliers through simulations and empirical studies.
{"title":"Regularized estimation of the Mahalanobis distance based on modified Cholesky decomposition","authors":"D. Dai, Jianxin Pan, Yuli Liang","doi":"10.1080/23737484.2022.2107961","DOIUrl":"https://doi.org/10.1080/23737484.2022.2107961","url":null,"abstract":"Abstract Estimating inverse covariance matrix is an essential part of many statistical methods. This paper proposes a regularized estimator for the inverse covariance matrix. Modified Cholesky decomposition (MCD) is utilized to construct positive definite estimators. Instead of directly regularizing the inverse covariance matrix itself, we impose regularization on the Cholesky factor. The estimated inverse covariance matrix is used to build Mahalanobis distance (MD). The proposed method is evaluated by detecting outliers through simulations and empirical studies.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"601 1","pages":"559 - 573"},"PeriodicalIF":0.0,"publicationDate":"2022-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77312934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}