Comparing the Linear and Quadratic Discriminant Analysis of Diabetes Disease Classification Based on Data Multicollinearity

Autcha Araveeporn
{"title":"Comparing the Linear and Quadratic Discriminant Analysis of Diabetes Disease Classification Based on Data Multicollinearity","authors":"Autcha Araveeporn","doi":"10.1155/2022/7829795","DOIUrl":null,"url":null,"abstract":"Linear and quadratic discriminant analysis are two fundamental classification methods used in statistical learning. Moments (MM), maximum likelihood (ML), minimum volume ellipsoids (MVE), and t-distribution methods are used to estimate the parameter of independent variables on the multivariate normal distribution in order to classify binary dependent variables. The MM and ML methods are popular and effective methods that approximate the distribution parameter and use observed data. However, the MVE and t-distribution methods focus on the resampling algorithm, a reliable tool for high resistance. This paper starts by explaining the concepts of linear and quadratic discriminant analysis and then presents the four other methods used to create the decision boundary. Our simulation study generated the independent variables by setting the coefficient correlation via multivariate normal distribution or multicollinearity, often through basic logistic regression used to construct the binary dependent variable. For application to Pima Indian diabetic dataset, we expressed the classification of diabetes as the dependent variable and used a dataset of eight independent variables. This paper aimed to determine the highest average percentage of accuracy. Our results showed that the MM and ML methods successfully used large independent variables for linear discriminant analysis (LDA). However, the t-distribution method of quadratic discriminant analysis (QDA) performed better when using small independent variables.","PeriodicalId":301406,"journal":{"name":"Int. J. Math. Math. Sci.","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Math. Math. Sci.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1155/2022/7829795","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Linear and quadratic discriminant analysis are two fundamental classification methods used in statistical learning. Moments (MM), maximum likelihood (ML), minimum volume ellipsoids (MVE), and t-distribution methods are used to estimate the parameter of independent variables on the multivariate normal distribution in order to classify binary dependent variables. The MM and ML methods are popular and effective methods that approximate the distribution parameter and use observed data. However, the MVE and t-distribution methods focus on the resampling algorithm, a reliable tool for high resistance. This paper starts by explaining the concepts of linear and quadratic discriminant analysis and then presents the four other methods used to create the decision boundary. Our simulation study generated the independent variables by setting the coefficient correlation via multivariate normal distribution or multicollinearity, often through basic logistic regression used to construct the binary dependent variable. For application to Pima Indian diabetic dataset, we expressed the classification of diabetes as the dependent variable and used a dataset of eight independent variables. This paper aimed to determine the highest average percentage of accuracy. Our results showed that the MM and ML methods successfully used large independent variables for linear discriminant analysis (LDA). However, the t-distribution method of quadratic discriminant analysis (QDA) performed better when using small independent variables.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于数据多重共线性的糖尿病疾病分类的线性与二次判别分析比较
线性和二次判别分析是统计学习中常用的两种基本分类方法。采用矩量法(MM)、最大似然法(ML)、最小体积椭球法(MVE)和t分布法对多元正态分布上的自变量参数进行估计,从而对二元因变量进行分类。MM法和ML法是利用观测数据逼近分布参数的常用有效方法。然而,MVE和t分布方法侧重于重采样算法,这是一种可靠的高电阻工具。本文首先解释了线性和二次判别分析的概念,然后介绍了用于创建决策边界的其他四种方法。我们的模拟研究通过设置多元正态分布或多重共线性的相关系数来产生自变量,通常通过基本的逻辑回归来构建二元因变量。为了应用于皮马印第安人糖尿病数据集,我们将糖尿病的分类表示为因变量,并使用8个自变量的数据集。本文旨在确定最高的平均准确率。结果表明,MM和ML方法成功地使用了大自变量进行线性判别分析(LDA)。然而,二次判别分析(QDA)的t分布方法在使用小自变量时表现更好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Analysis of Investment Returns as Markov Chain Random Walk Prediction of the Stock Prices at Uganda Securities Exchange Using the Exponential Ornstein-Uhlenbeck Model Nth Composite Iterative Scheme via Weak Contractions with Application Tangent Hyperbolic Fluid Flow under Condition of Divergent Channel in the Presence of Porous Medium with Suction/Blowing and Heat Source: Emergence of the Boundary Layer Estimation of Finite Population Mean under Probability-Proportional-to-Size Sampling in the Presence of Extreme Values
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1