Improved Principal Component Analysis and Linear Discriminant Analysis for the Determination of Origin of Coffee Beans using

Endale Deribe Jiru, Berhanu Guta Wordofa, M. Redi-Abshiro
{"title":"Improved Principal Component Analysis and Linear Discriminant Analysis for the Determination of Origin of Coffee Beans using","authors":"Endale Deribe Jiru, Berhanu Guta Wordofa, M. Redi-Abshiro","doi":"10.4314/sinet.v45i1.1","DOIUrl":null,"url":null,"abstract":"In this work an improved Principal Component Analysis (pca) method is used for better determination of geographical origins of Ethiopian Green Coffee Beans. In the commercially available and widely employed pca methods the dataset is commonly normalized using Z-score procedure, which reduces the influence of the spread of data (or dispersion degree differences) on principal components (pcs). In the improved method, a new normalization procedure is introduced with the aim to improve the spread (dispersion) of data points around the mean. The pcs computed from the improved procedure could significantly better reflect information of the original dataset. The dispersion degree information in the original dataset was retained relatively much by using the improved pca than the Z-score-based pca. The improved pca was then used to identify the most discriminating variables corresponding to the coffee samples and, based on that, Linear Discrimination Analysis (lda) model was developed to classify and predict samples. The recognition and prediction abilities of the improved pca and lda at regional level respectively were 95.7% and 94% (using Chlorogenic Acids (cga s) content), 91% and 97% (using Fatty Acids (FA) content), 99% and 100% (and using the combined cga and FA contents). Mehari et al. (2016, 2019) reported recognition and prediction of the pca, they applied on the same dataset, at regional level were 91% and 90% (using cga s content) and 95% and 92 % (using fas content), respectively. The result reveals that the newly introduced method is superior and the best discriminations of coffee beans were achieved. The combined analysis of cga and fa concentrations is a useful tool for the determination of origin of coffee beans, and we recommend that the concerned bodies should use it to address the characterization, classification and authentication of Ethiopian coffee beans according to their geographical origins.","PeriodicalId":275075,"journal":{"name":"SINET: Ethiopian Journal of Science","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SINET: Ethiopian Journal of Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4314/sinet.v45i1.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In this work an improved Principal Component Analysis (pca) method is used for better determination of geographical origins of Ethiopian Green Coffee Beans. In the commercially available and widely employed pca methods the dataset is commonly normalized using Z-score procedure, which reduces the influence of the spread of data (or dispersion degree differences) on principal components (pcs). In the improved method, a new normalization procedure is introduced with the aim to improve the spread (dispersion) of data points around the mean. The pcs computed from the improved procedure could significantly better reflect information of the original dataset. The dispersion degree information in the original dataset was retained relatively much by using the improved pca than the Z-score-based pca. The improved pca was then used to identify the most discriminating variables corresponding to the coffee samples and, based on that, Linear Discrimination Analysis (lda) model was developed to classify and predict samples. The recognition and prediction abilities of the improved pca and lda at regional level respectively were 95.7% and 94% (using Chlorogenic Acids (cga s) content), 91% and 97% (using Fatty Acids (FA) content), 99% and 100% (and using the combined cga and FA contents). Mehari et al. (2016, 2019) reported recognition and prediction of the pca, they applied on the same dataset, at regional level were 91% and 90% (using cga s content) and 95% and 92 % (using fas content), respectively. The result reveals that the newly introduced method is superior and the best discriminations of coffee beans were achieved. The combined analysis of cga and fa concentrations is a useful tool for the determination of origin of coffee beans, and we recommend that the concerned bodies should use it to address the characterization, classification and authentication of Ethiopian coffee beans according to their geographical origins.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
改进主成分分析与线性判别分析在咖啡豆产地判定中的应用
在这项工作中,改进的主成分分析(pca)方法用于更好地确定埃塞俄比亚绿咖啡豆的地理来源。在商业上可用和广泛使用的pca方法中,数据集通常使用Z-score程序进行规范化,这减少了数据传播(或分散程度差异)对主成分(pc)的影响。在改进的方法中,引入了一种新的归一化过程,目的是提高数据点在均值附近的分布(离散度)。改进后的计算结果能明显更好地反映原始数据集的信息。与基于z分数的主成分分析相比,改进主成分分析保留了原始数据集中的离散度信息。在此基础上,建立线性判别分析(Linear Discrimination Analysis, lda)模型对咖啡样本进行分类和预测。改进pca和lda在区域水平上的识别和预测能力分别为95.7%和94%(使用绿原酸(cga)含量),91%和97%(使用脂肪酸(FA)含量),99%和100%(使用cga和FA组合含量)。Mehari等人(2016,2019)报告了pca的识别和预测,它们应用于相同的数据集,在区域水平上分别为91%和90%(使用cga s含量)和95%和92%(使用fas含量)。结果表明,该方法具有较好的辨识性,能达到最佳的咖啡豆鉴别效果。cga和fa浓度的联合分析是确定咖啡豆原产地的有用工具,我们建议有关机构应根据其地理来源来解决埃塞俄比亚咖啡豆的特征,分类和认证问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
New Record for Scolopia Sp. Nov. (Salicaceaesensulato) from the Early Miocene of Ethiopia: Identification and Classification of Fossil Leaves into their Living Relatives Evaluation of phenotypic relationships of date palm cultivars at Melka Werer, Ethiopia Assessing the Motivation of First-year Undergraduate Students for Physical Fitness Workout and Contextual Differences at Bahir Dar University Identifying Amharic-Tigrigna Shared Features: Towards Optimizing Implementation of Under Resourced Languages Common fixed points of generalized F -contraction of multivalued mappings in bi-b-metric spaces
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1