A novel two-phase near-infrared and midinfrared wavelength selection framework for sample classification

IF 2.3 4区 化学 Q1 SOCIAL WORK Journal of Chemometrics Pub Date : 2024-02-17 DOI:10.1002/cem.3536
Juliana Fontes, Michel J. Anzanello, João B. G. Brito, Guilherme B. Bucco
{"title":"A novel two-phase near-infrared and midinfrared wavelength selection framework for sample classification","authors":"Juliana Fontes,&nbsp;Michel J. Anzanello,&nbsp;João B. G. Brito,&nbsp;Guilherme B. Bucco","doi":"10.1002/cem.3536","DOIUrl":null,"url":null,"abstract":"<p>Spectral data describing product samples are typically composed of a large number of noisy and irrelevant wavelengths that tends to undermine the performance of multivariate predictive techniques. This paper proposes a two-phase framework that integrates a preselection wavelength step oriented by wavelength clustering to a wrapper-based strategy. The first phase performs a pruning process in the data that removes the less informative wavelengths relying on the spectral clustering, a technique deemed suitable to the Fourier transform infrared (FTIR) spectroscopy and near-infrared (NIR) spectroscopy data at hand. The preselected wavelengths undergo a second phase of selection efforts based on the combination of different wavelength importance indices (i.e., Bhattacharyya distance, Chi-square, ReliefF, and Gini) and classification techniques (i.e., support vector machine, <i>k</i>-nearest neighbors, and random forest). When applied to 11 FTIR datasets from different domains, the recommended combination of importance index and classifier increased the average accuracy by 6.37% (from 0.863 to 0.918), while retaining average 3.84% of the original spectra. The framework also improved the selection process regarding computational time.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":null,"pages":null},"PeriodicalIF":2.3000,"publicationDate":"2024-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemometrics","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cem.3536","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL WORK","Score":null,"Total":0}
引用次数: 0

Abstract

Spectral data describing product samples are typically composed of a large number of noisy and irrelevant wavelengths that tends to undermine the performance of multivariate predictive techniques. This paper proposes a two-phase framework that integrates a preselection wavelength step oriented by wavelength clustering to a wrapper-based strategy. The first phase performs a pruning process in the data that removes the less informative wavelengths relying on the spectral clustering, a technique deemed suitable to the Fourier transform infrared (FTIR) spectroscopy and near-infrared (NIR) spectroscopy data at hand. The preselected wavelengths undergo a second phase of selection efforts based on the combination of different wavelength importance indices (i.e., Bhattacharyya distance, Chi-square, ReliefF, and Gini) and classification techniques (i.e., support vector machine, k-nearest neighbors, and random forest). When applied to 11 FTIR datasets from different domains, the recommended combination of importance index and classifier increased the average accuracy by 6.37% (from 0.863 to 0.918), while retaining average 3.84% of the original spectra. The framework also improved the selection process regarding computational time.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于样品分类的新型两相近红外和中红外波长选择框架
描述产品样本的光谱数据通常由大量噪声和无关波长组成,这往往会削弱多元预测技术的性能。本文提出了一个两阶段框架,将以波长聚类为导向的预选波长步骤与基于包装的策略相结合。第一阶段在数据中执行剪枝过程,根据光谱聚类去除信息量较少的波长,这种技术被认为适用于手头的傅立叶变换红外(FTIR)光谱和近红外(NIR)光谱数据。根据不同波长重要性指数(即 Bhattacharyya 距离、Chi-square、ReliefF 和 Gini)和分类技术(即支持向量机、k-近邻和随机森林)的组合,对预选波长进行第二阶段的选择工作。当应用于来自不同领域的 11 个傅立叶变换红外数据集时,推荐的重要性指数和分类器组合将平均准确率提高了 6.37%(从 0.863 提高到 0.918),同时平均保留了 3.84% 的原始光谱。在计算时间方面,该框架还改进了选择过程。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Chemometrics
Journal of Chemometrics 化学-分析化学
CiteScore
5.20
自引率
8.30%
发文量
78
审稿时长
2 months
期刊介绍: The Journal of Chemometrics is devoted to the rapid publication of original scientific papers, reviews and short communications on fundamental and applied aspects of chemometrics. It also provides a forum for the exchange of information on meetings and other news relevant to the growing community of scientists who are interested in chemometrics and its applications. Short, critical review papers are a particularly important feature of the journal, in view of the multidisciplinary readership at which it is aimed.
期刊最新文献
Issue Information Issue Information Resampling as a Robust Measure of Model Complexity in PARAFAC Models Population Power Curves in ASCA With Permutation Testing A Non‐Linear Model for Multiple Alcohol Intakes and Optimal Designs Strategies
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1