Near-infrared spectral interval screening based on hierarchical variables clustering and group SCAD in multivariate calibration

IF 2.7 3区 化学 Q2 CHEMISTRY, ANALYTICAL Vibrational Spectroscopy Pub Date : 2024-02-17 DOI:10.1016/j.vibspec.2024.103664
Chen-Hao Huang
{"title":"Near-infrared spectral interval screening based on hierarchical variables clustering and group SCAD in multivariate calibration","authors":"Chen-Hao Huang","doi":"10.1016/j.vibspec.2024.103664","DOIUrl":null,"url":null,"abstract":"<div><p>Spectral interval screening is a critical step in multivariate calibration, which can improve the model predictive performance and data interpretation. In this study, a novel method for interval selection is proposed based on a hierarchical variables clustering and group smoothly clipped absolute deviation(group SCAD) in combination with partial least squares(VCG-PLS). The proposed method makes use of hierarchical variables clustering to yield a variables partitioning into groups at each level, and these groups of variables from different clustering levels are then used as input for group SCAD. The method is designed to select informative wavelength intervals for near-infrared(NIR) spectroscopic data analysis. The proposed method mainly consists of three steps. Firstly, an effective hierarchical clustering is employed to cluster wavelengths(variables), which generates a partition of variables into groups at each hierarchy level and obtains all possible wavelength intervals. Then, the series of group variables obtained from various hierarchy levels are given as input to group-SCAD, and group-SCAD can generate potential group variables corresponding to each regularization parameter value. Finally, a collection of PLS models is constructed recursively by employing all wavelength intervals except one, until the optimal wavelength intervals are obtained. The optimal intervals correspond to the lowest root mean square error of prediction. The VCG-PLS integrates the advantages of hierarchical variable clustering and group SCAD, which is an efficient technique to enhance the performance of PLS in interval selection. The performance of VCG-PLS was tested on three real NIR datasets. The results demonstrate that VCG-PLS can improve prediction performance with fewer variables and may be a good wavelength interval selection strategy.</p></div>","PeriodicalId":23656,"journal":{"name":"Vibrational Spectroscopy","volume":"131 ","pages":"Article 103664"},"PeriodicalIF":2.7000,"publicationDate":"2024-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Vibrational Spectroscopy","FirstCategoryId":"92","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0924203124000171","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Spectral interval screening is a critical step in multivariate calibration, which can improve the model predictive performance and data interpretation. In this study, a novel method for interval selection is proposed based on a hierarchical variables clustering and group smoothly clipped absolute deviation(group SCAD) in combination with partial least squares(VCG-PLS). The proposed method makes use of hierarchical variables clustering to yield a variables partitioning into groups at each level, and these groups of variables from different clustering levels are then used as input for group SCAD. The method is designed to select informative wavelength intervals for near-infrared(NIR) spectroscopic data analysis. The proposed method mainly consists of three steps. Firstly, an effective hierarchical clustering is employed to cluster wavelengths(variables), which generates a partition of variables into groups at each hierarchy level and obtains all possible wavelength intervals. Then, the series of group variables obtained from various hierarchy levels are given as input to group-SCAD, and group-SCAD can generate potential group variables corresponding to each regularization parameter value. Finally, a collection of PLS models is constructed recursively by employing all wavelength intervals except one, until the optimal wavelength intervals are obtained. The optimal intervals correspond to the lowest root mean square error of prediction. The VCG-PLS integrates the advantages of hierarchical variable clustering and group SCAD, which is an efficient technique to enhance the performance of PLS in interval selection. The performance of VCG-PLS was tested on three real NIR datasets. The results demonstrate that VCG-PLS can improve prediction performance with fewer variables and may be a good wavelength interval selection strategy.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于多变量校准中的分层变量聚类和组 SCAD 的近红外光谱间隔筛选
谱区间筛选是多变量校准的关键步骤,可以提高模型的预测性能和数据解释能力。本研究提出了一种基于分层变量聚类和组平滑剪切绝对偏差(组 SCAD)结合偏最小二乘法(VCG-PLS)的新型区间筛选方法。所提出的方法利用分层变量聚类将变量划分为各个层次的变量组,然后将这些来自不同聚类层次的变量组作为组 SCAD 的输入。该方法旨在为近红外光谱数据分析选择有参考价值的波长区间。所提出的方法主要包括三个步骤。首先,采用有效的分层聚类方法对波长(变量)进行聚类,在每个层次上对变量进行分组,从而得到所有可能的波长区间。然后,将从不同层次得到的一系列组变量作为组-SCAD 的输入,组-SCAD 可以生成与每个正则化参数值相对应的潜在组变量。最后,通过使用除一个波长区间外的所有波长区间,递归地构建 PLS 模型集合,直至获得最佳波长区间。最佳区间对应于最小的预测均方根误差。VCG-PLS 综合了分层变量聚类和组 SCAD 的优点,是一种提高 PLS 波长区间选择性能的有效技术。VCG-PLS 的性能在三个真实的近红外数据集上进行了测试。结果表明,VCG-PLS 可以用较少的变量提高预测性能,可能是一种很好的波长区间选择策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Vibrational Spectroscopy
Vibrational Spectroscopy 化学-分析化学
CiteScore
4.70
自引率
4.00%
发文量
103
审稿时长
52 days
期刊介绍: Vibrational Spectroscopy provides a vehicle for the publication of original research that focuses on vibrational spectroscopy. This covers infrared, near-infrared and Raman spectroscopies and publishes papers dealing with developments in applications, theory, techniques and instrumentation. The topics covered by the journal include: Sampling techniques, Vibrational spectroscopy coupled with separation techniques, Instrumentation (Fourier transform, conventional and laser based), Data manipulation, Spectra-structure correlation and group frequencies. The application areas covered include: Analytical chemistry, Bio-organic and bio-inorganic chemistry, Organic chemistry, Inorganic chemistry, Catalysis, Environmental science, Industrial chemistry, Materials science, Physical chemistry, Polymer science, Process control, Specialized problem solving.
期刊最新文献
Harnessing the past: Vibration analysis of organic additives in ancient plasters for sustainable building solutions Research on vehicle-mounted measurement of NO2 based on cavity ring-down spectroscopy The infrared spectra of primary amides, Part 2. Deuteration of benzamide and hydrogen bonding effects of ortho alkoxybenzamides Diagnosis of corn leaf diseases by FTIR spectroscopy combined with machine learning Evaluating the thermal stability of hazelnut oil in comparison with common edible oils in Turkey using ATR infrared spectroscopy
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1