Performance evaluation of variable selection methods coupled with partial least squares regression to determine the target component in solid samples

IF 1.6 4区 化学 Q3 CHEMISTRY, APPLIED Journal of Near Infrared Spectroscopy Pub Date : 2022-05-12 DOI:10.1177/09670335221097236
Na Zhao, Zhisheng Wu, Chunying Wu, Shuyu Wang, Xueyan Zhan
{"title":"Performance evaluation of variable selection methods coupled with partial least squares regression to determine the target component in solid samples","authors":"Na Zhao, Zhisheng Wu, Chunying Wu, Shuyu Wang, Xueyan Zhan","doi":"10.1177/09670335221097236","DOIUrl":null,"url":null,"abstract":"Variable selection can improve the robustness and prediction accuracy of partial least squares (PLS) regression models and decrease the calculation time by selecting the optimal subset of variables in multivariate calibration. In this study, the performance of two variable selection methods for wavelength interval and individual wavelength coupled with partial least squares regression are investigated by employing the experimental data of asiaticoside (AS) and madecassoside (MS) contents in centella total glucosides (CTG) and a public dataset of corn. The studied variable selection methods include interval partial least squares regression (iPLS), backward interval partial least squares (biPLS), synergy interval partial least squares regression (siPLS), competitive adaptive reweighted sampling (CARS), uninformative variable elimination (UVE) and variable importance in projection (VIP). The results show that the implementation of variable selection methods improved the performance of the model compared with full-spectrum modeling. All variable selection methods improved the prediction of AS or MS contents in CTG. When latent variables for PLS models are less than 10 in the practical application, the RPD value of AS models by iPLS method is 7.5, and the RPD value of MS models by biPLS method is 2.9. The results of wavelength interval selection are better than individual wavelength selection, especially for iPLS and biPLS. The same results were obtained with the public data for moisture in corn, and the RPD value of biPLS model of moisture is 1.6. Therefore, the wavelength interval selection methods, such as iPLS or biPLS, are appropriate for improving the PLS model’s accuracy and robustness to determine the target components’ contents in solid samples. Graphical Abstract","PeriodicalId":16551,"journal":{"name":"Journal of Near Infrared Spectroscopy","volume":null,"pages":null},"PeriodicalIF":1.6000,"publicationDate":"2022-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Near Infrared Spectroscopy","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1177/09670335221097236","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CHEMISTRY, APPLIED","Score":null,"Total":0}
引用次数: 0

Abstract

Variable selection can improve the robustness and prediction accuracy of partial least squares (PLS) regression models and decrease the calculation time by selecting the optimal subset of variables in multivariate calibration. In this study, the performance of two variable selection methods for wavelength interval and individual wavelength coupled with partial least squares regression are investigated by employing the experimental data of asiaticoside (AS) and madecassoside (MS) contents in centella total glucosides (CTG) and a public dataset of corn. The studied variable selection methods include interval partial least squares regression (iPLS), backward interval partial least squares (biPLS), synergy interval partial least squares regression (siPLS), competitive adaptive reweighted sampling (CARS), uninformative variable elimination (UVE) and variable importance in projection (VIP). The results show that the implementation of variable selection methods improved the performance of the model compared with full-spectrum modeling. All variable selection methods improved the prediction of AS or MS contents in CTG. When latent variables for PLS models are less than 10 in the practical application, the RPD value of AS models by iPLS method is 7.5, and the RPD value of MS models by biPLS method is 2.9. The results of wavelength interval selection are better than individual wavelength selection, especially for iPLS and biPLS. The same results were obtained with the public data for moisture in corn, and the RPD value of biPLS model of moisture is 1.6. Therefore, the wavelength interval selection methods, such as iPLS or biPLS, are appropriate for improving the PLS model’s accuracy and robustness to determine the target components’ contents in solid samples. Graphical Abstract
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
结合偏最小二乘回归确定固体样品中目标组分的变量选择方法的性能评价
变量选择可以通过在多元校准中选择变量的最优子集来提高偏最小二乘回归模型的稳健性和预测精度,并减少计算时间。本研究利用积雪草总苷(CTG)中积雪草苷(AS)和积雪草甙(MS)含量的实验数据和玉米的公共数据集,研究了波长区间和单个波长两种变量选择方法与偏最小二乘回归相结合的性能。所研究的变量选择方法包括区间偏最小二乘回归(iPLS)、后向区间偏最小二乘(biPLS)、协同区间偏最小二乘返回(siPLS)、竞争自适应重加权抽样(CARS)、无信息变量消除(UVE)和变量在投影中的重要性(VIP)。结果表明,与全谱建模相比,变量选择方法的实现提高了模型的性能。所有的变量选择方法都改进了CTG中AS或MS含量的预测。在实际应用中,当PLS模型的潜在变量小于10时,iPLS方法的AS模型的RPD值为7.5,biPLS方法的MS模型的RPD值为2.9。波长间隔选择的结果优于单独的波长选择,特别是对于iPLS和biPLS。玉米水分的公开数据也得到了相同的结果,水分的biPLS模型的RPD值为1.6。因此,波长间隔选择方法,如iPLS或biPLS,适用于提高PLS模型的准确性和稳健性,以确定固体样品中目标成分的含量。图形摘要
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
3.30
自引率
5.60%
发文量
35
审稿时长
6 months
期刊介绍: JNIRS — Journal of Near Infrared Spectroscopy is a peer reviewed journal, publishing original research papers, short communications, review articles and letters concerned with near infrared spectroscopy and technology, its application, new instrumentation and the use of chemometric and data handling techniques within NIR.
期刊最新文献
Non-linear machine learning coupled near infrared spectroscopy enhanced model performance and insights for coffee origin traceability Using visible and near infrared spectroscopy and machine learning for estimating total petroleum hydrocarbons in contaminated soils Detection and classification of spongy tissue disorder in mango fruit during ripening by using visible-near infrared spectroscopy and multivariate analysis A method to standardize the temperature for near infrared spectra of the indigo pigment in non-dairy cream based on symbolic regression Moisture content of Panax notoginseng taproot predicted using near infrared spectroscopy
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1