Bayesian and partial least square global forage calibrations models developed by an iterative procedure using R

A. Ferragina, F. Benozzo, P. Berzaghi
{"title":"Bayesian and partial least square global forage calibrations models developed by an iterative procedure using\nR","authors":"A. Ferragina, F. Benozzo, P. Berzaghi","doi":"10.1255/NIR2017.051","DOIUrl":null,"url":null,"abstract":"Author Summary: The aim of our study was to test an iterative process of validation implemented in the R software, assessing the accuracy of the best selected equations, developed using two different regression algorithms Partial Least Square (PLS) and Bayesian. A data set (Seta) with 3187 records of 6 different types of forages was used. The calibrations were tested for Protein, Neutral Detergent Fiber and Acid Detergent Fiber. For each sample a spectrum was collected using a FOSS NIRSystem (1100–2498 nm). A subset composed of 20 samples for each type of forage (Setext;120 samples) was randomly selected for a final validation of the best selected equations. The remaining samples (Setb = Seta – Setext) were used for the iterative calibration process. For each iteration the Setb was randomly divided in a testing set (Settst; 10 % of Setb) and a training set (Settrn = Setb – Settst); 300 iterations were done. All of the computations were done in the R environment. The packages used were “pls” for the PLS, “BGLR” for the Bayesian, “prospectr” for the spectral treatments. In each iteration we used three spectral treatments (raw, 1 derivative, standard normal variate and detrend), two approaches for selection of the optimal number of PLS components and the Bayesian model. Nine types of equations were developed and tested in each iteration [(2 PLS techniques + 1 Bayesian) × 3 spectral treatments]. Among the 300 iterations, for each one of the 9 equation types, the best one (lowest RMSE) and the average of the best 25 % (RMSE < 1 quartile) were selected and validated by forage type. R has demonstrated its potential when used for the chemiometric process on big data set and with complex statistical procedures. R2 higher than 0.9 was obtained for almost all the calibrations. In the external validation the Bayesian models in many cases outperform the commonly used PLS, demonstrating that an alternative for the improvement of the prediction accuracy exists. The present work has demonstrated that iterative validation subsampling on big data can lead to the selection of proper equations, and it can be done using R.","PeriodicalId":20429,"journal":{"name":"Proceedings of the 18th International Conference on Near Infrared Spectroscopy","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 18th International Conference on Near Infrared Spectroscopy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1255/NIR2017.051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Author Summary: The aim of our study was to test an iterative process of validation implemented in the R software, assessing the accuracy of the best selected equations, developed using two different regression algorithms Partial Least Square (PLS) and Bayesian. A data set (Seta) with 3187 records of 6 different types of forages was used. The calibrations were tested for Protein, Neutral Detergent Fiber and Acid Detergent Fiber. For each sample a spectrum was collected using a FOSS NIRSystem (1100–2498 nm). A subset composed of 20 samples for each type of forage (Setext;120 samples) was randomly selected for a final validation of the best selected equations. The remaining samples (Setb = Seta – Setext) were used for the iterative calibration process. For each iteration the Setb was randomly divided in a testing set (Settst; 10 % of Setb) and a training set (Settrn = Setb – Settst); 300 iterations were done. All of the computations were done in the R environment. The packages used were “pls” for the PLS, “BGLR” for the Bayesian, “prospectr” for the spectral treatments. In each iteration we used three spectral treatments (raw, 1 derivative, standard normal variate and detrend), two approaches for selection of the optimal number of PLS components and the Bayesian model. Nine types of equations were developed and tested in each iteration [(2 PLS techniques + 1 Bayesian) × 3 spectral treatments]. Among the 300 iterations, for each one of the 9 equation types, the best one (lowest RMSE) and the average of the best 25 % (RMSE < 1 quartile) were selected and validated by forage type. R has demonstrated its potential when used for the chemiometric process on big data set and with complex statistical procedures. R2 higher than 0.9 was obtained for almost all the calibrations. In the external validation the Bayesian models in many cases outperform the commonly used PLS, demonstrating that an alternative for the improvement of the prediction accuracy exists. The present work has demonstrated that iterative validation subsampling on big data can lead to the selection of proper equations, and it can be done using R.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
贝叶斯和偏最小二乘全局牧草校准模型开发的迭代过程使用r
作者简介:我们研究的目的是测试在R软件中实现的验证迭代过程,评估最佳选择方程的准确性,使用两种不同的回归算法偏最小二乘(PLS)和贝叶斯。采用6种不同类型牧草3187条记录的数据集(Seta)。对蛋白质、中性洗涤纤维和酸性洗涤纤维进行了标定。每个样品使用FOSS NIRSystem (1100-2498 nm)采集光谱。每种饲料随机选取20个样本组成的子集(Setext;120个样本),对最佳选择的方程进行最终验证。剩余样品(Setb = Seta - Setext)用于迭代校准过程。对于每次迭代,Setb被随机分为一个测试集(setst;10%的Setb)和一个训练集(setn = Setb - Settst);完成了300次迭代。所有的计算都是在R环境中完成的。使用的包是pls的“pls”,贝叶斯的“BGLR”,光谱处理的“prospectr”。在每次迭代中,我们使用三种光谱处理(原始,1导数,标准正态变量和趋势),两种方法选择PLS成分的最佳数量和贝叶斯模型。在每次迭代中开发并测试了9种类型的方程[(2种PLS技术+ 1种贝叶斯)× 3种光谱处理]。在300次迭代中,对9种方程类型中的每一种都选取最佳(RMSE最低)和最佳25% (RMSE < 1四分位数)的平均值,并按饲料类型进行验证。在大数据集和复杂统计程序的化学计量过程中,R已经展示了它的潜力。几乎所有校准的R2均大于0.9。在外部验证中,贝叶斯模型在许多情况下优于常用的PLS,表明存在一种提高预测精度的替代方法。目前的工作已经证明,在大数据上迭代验证子抽样可以导致合适方程的选择,并且可以使用R来完成。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Monitoring of the application of laminating adhesives to polyurethane foam by near infrared chemical imaging First trial with an all-fibre near infrared spectrometer evaluated by multivariate curve resolution Phenotypic classification of sugarcane from near infrared spectra obtained directly from stalk using ordered predictors selection and partial least squares-discriminant analysis Performance comparison of bench-top, hyperspectral imaging and pocket near infrared spectrometers: the example of protein quantification in wheat flour Multivariate data analysis of near-infrared spectra of cultivation medium powders for mammalian cells
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1