Outlier Removal with Weight Penalization and Aggregation: A Robust Variable Selection Method for Enhancing Near-Infrared Spectral Analysis Performance

IF 6.7 1区 化学 Q1 CHEMISTRY, ANALYTICAL Analytical Chemistry Pub Date : 2025-02-19 DOI:10.1021/acs.analchem.4c07007
Beibei Li, Wenting Li, Junwei Guo, Hongbo Wang, Ran Wan, Yu Liu, Meijuan Fan, Cong Wang, Song Yang, Le Zhao, Cong Nie
{"title":"Outlier Removal with Weight Penalization and Aggregation: A Robust Variable Selection Method for Enhancing Near-Infrared Spectral Analysis Performance","authors":"Beibei Li, Wenting Li, Junwei Guo, Hongbo Wang, Ran Wan, Yu Liu, Meijuan Fan, Cong Wang, Song Yang, Le Zhao, Cong Nie","doi":"10.1021/acs.analchem.4c07007","DOIUrl":null,"url":null,"abstract":"Full-wavelength near-infrared (NIR) spectroscopy faces significant challenges due to the strong collinearity among spectral variables and the presence of variables that are highly sensitive to sample fluctuations. Additionally, not all spectral variables contribute equally to the NIR model. Weakly influential variables, although not important on their own, can provide substantial improvement when combined with stronger variables, thus increasing both model stability and prediction accuracy. Therefore, this study proposes a new variable selection method called outlier removal with weight penalization and aggregation (OR-WPA). The method begins by removing outlier spectral variables with high coefficient of variation, which enhances model stability. During the variable selection process, multiple submodels are constructed based on variable subsets, with variable weights assigned according to the absolute values of regression coefficients. A moving window is applied to average the weights, and variables with excessively high weights are penalized, promoting the selection of weakly influential variables that positively contribute to model accuracy. The variable space is iteratively reduced, and the subset of variables associated with the highest predictive accuracy is selected as the final characteristic variable combination. The OR-WPA method was evaluated on three NIR spectral data sets, involving corn, heated tobacco substrate, and flue-cured tobacco. The results were compared with three advanced variable selection methods: Monte Carlo uninformative variable elimination, competitive adaptive reweighted sampling, and bootstrapping soft shrinkage. The results indicate that OR-WPA demonstrates better predictive performance, particularly in predicting low-content components, where it significantly enhances both the accuracy and stability of the NIR model.","PeriodicalId":27,"journal":{"name":"Analytical Chemistry","volume":"8 1","pages":""},"PeriodicalIF":6.7000,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytical Chemistry","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.analchem.4c07007","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Full-wavelength near-infrared (NIR) spectroscopy faces significant challenges due to the strong collinearity among spectral variables and the presence of variables that are highly sensitive to sample fluctuations. Additionally, not all spectral variables contribute equally to the NIR model. Weakly influential variables, although not important on their own, can provide substantial improvement when combined with stronger variables, thus increasing both model stability and prediction accuracy. Therefore, this study proposes a new variable selection method called outlier removal with weight penalization and aggregation (OR-WPA). The method begins by removing outlier spectral variables with high coefficient of variation, which enhances model stability. During the variable selection process, multiple submodels are constructed based on variable subsets, with variable weights assigned according to the absolute values of regression coefficients. A moving window is applied to average the weights, and variables with excessively high weights are penalized, promoting the selection of weakly influential variables that positively contribute to model accuracy. The variable space is iteratively reduced, and the subset of variables associated with the highest predictive accuracy is selected as the final characteristic variable combination. The OR-WPA method was evaluated on three NIR spectral data sets, involving corn, heated tobacco substrate, and flue-cured tobacco. The results were compared with three advanced variable selection methods: Monte Carlo uninformative variable elimination, competitive adaptive reweighted sampling, and bootstrapping soft shrinkage. The results indicate that OR-WPA demonstrates better predictive performance, particularly in predicting low-content components, where it significantly enhances both the accuracy and stability of the NIR model.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于权值惩罚和聚集的异常值去除:一种增强近红外光谱分析性能的鲁棒变量选择方法
全波长近红外(NIR)光谱学由于光谱变量之间的强共线性以及对样品波动高度敏感的变量的存在而面临重大挑战。此外,并非所有的光谱变量对近红外模型的贡献都是一样的。弱影响变量虽然本身并不重要,但当与强影响变量结合时,可以提供实质性的改进,从而提高模型的稳定性和预测精度。因此,本研究提出了一种新的变量选择方法——基于权重惩罚和聚集的离群值去除(OR-WPA)。该方法首先去除变异系数高的离群谱变量,提高了模型的稳定性。在变量选择过程中,基于变量子集构建多个子模型,并根据回归系数的绝对值赋予不同的权重。应用移动窗口对权重进行平均,并对权重过高的变量进行惩罚,从而促进对模型精度有积极贡献的弱影响变量的选择。迭代化简变量空间,选择预测精度最高的变量子集作为最终的特征变量组合。在玉米、烤烟基材和烤烟三种近红外光谱数据集上对OR-WPA方法进行了评价。结果与蒙特卡罗无信息变量消除、竞争自适应重加权采样和自举软收缩三种先进的变量选择方法进行了比较。结果表明,OR-WPA具有更好的预测性能,特别是在预测低含量成分时,它显著提高了近红外模型的准确性和稳定性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Analytical Chemistry
Analytical Chemistry 化学-分析化学
CiteScore
12.10
自引率
12.20%
发文量
1949
审稿时长
1.4 months
期刊介绍: Analytical Chemistry, a peer-reviewed research journal, focuses on disseminating new and original knowledge across all branches of analytical chemistry. Fundamental articles may explore general principles of chemical measurement science and need not directly address existing or potential analytical methodology. They can be entirely theoretical or report experimental results. Contributions may cover various phases of analytical operations, including sampling, bioanalysis, electrochemistry, mass spectrometry, microscale and nanoscale systems, environmental analysis, separations, spectroscopy, chemical reactions and selectivity, instrumentation, imaging, surface analysis, and data processing. Papers discussing known analytical methods should present a significant, original application of the method, a notable improvement, or results on an important analyte.
期刊最新文献
Mass Spectrometric Analysis of Exercise-Induced Breath Metabolites, Lipids, and Proteins Using Wearable Cold-Mask Sampling Logic-Gated Iontronic Sensor with Cascaded Signal Amplification for Simultaneous Detection of m5C-DNA and m6A-RNA Top-Down Exfoliation of Ultrathin Chiral 2D Nanosheets for High-Resolution Gas Chromatography. Detection of Gaseous Mercuric Halides Using Acetate and Iodide Chemical Ionization Mass Spectrometry. Issue Publication Information
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1