A novel two-phase near-infrared and midinfrared wavelength selection framework for sample classification

IF 2.1 4区化学 Q1 SOCIAL WORK Journal of Chemometrics Pub Date : 2024-02-17 DOI:10.1002/cem.3536

Juliana Fontes, Michel J. Anzanello, João B. G. Brito, Guilherme B. Bucco

{"title":"A novel two-phase near-infrared and midinfrared wavelength selection framework for sample classification","authors":"Juliana Fontes, Michel J. Anzanello, João B. G. Brito, Guilherme B. Bucco","doi":"10.1002/cem.3536","DOIUrl":null,"url":null,"abstract":"<p>Spectral data describing product samples are typically composed of a large number of noisy and irrelevant wavelengths that tends to undermine the performance of multivariate predictive techniques. This paper proposes a two-phase framework that integrates a preselection wavelength step oriented by wavelength clustering to a wrapper-based strategy. The first phase performs a pruning process in the data that removes the less informative wavelengths relying on the spectral clustering, a technique deemed suitable to the Fourier transform infrared (FTIR) spectroscopy and near-infrared (NIR) spectroscopy data at hand. The preselected wavelengths undergo a second phase of selection efforts based on the combination of different wavelength importance indices (i.e., Bhattacharyya distance, Chi-square, ReliefF, and Gini) and classification techniques (i.e., support vector machine, <i>k</i>-nearest neighbors, and random forest). When applied to 11 FTIR datasets from different domains, the recommended combination of importance index and classifier increased the average accuracy by 6.37% (from 0.863 to 0.918), while retaining average 3.84% of the original spectra. The framework also improved the selection process regarding computational time.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 3","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2024-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemometrics","FirstCategoryId":"92","ListUrlMain":"https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/10.1002/cem.3536","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL WORK","Score":null,"Total":0}

引用次数: 0

Abstract

Spectral data describing product samples are typically composed of a large number of noisy and irrelevant wavelengths that tends to undermine the performance of multivariate predictive techniques. This paper proposes a two-phase framework that integrates a preselection wavelength step oriented by wavelength clustering to a wrapper-based strategy. The first phase performs a pruning process in the data that removes the less informative wavelengths relying on the spectral clustering, a technique deemed suitable to the Fourier transform infrared (FTIR) spectroscopy and near-infrared (NIR) spectroscopy data at hand. The preselected wavelengths undergo a second phase of selection efforts based on the combination of different wavelength importance indices (i.e., Bhattacharyya distance, Chi-square, ReliefF, and Gini) and classification techniques (i.e., support vector machine, k-nearest neighbors, and random forest). When applied to 11 FTIR datasets from different domains, the recommended combination of importance index and classifier increased the average accuracy by 6.37% (from 0.863 to 0.918), while retaining average 3.84% of the original spectra. The framework also improved the selection process regarding computational time.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于样品分类的新型两相近红外和中红外波长选择框架

描述产品样本的光谱数据通常由大量噪声和无关波长组成，这往往会削弱多元预测技术的性能。本文提出了一个两阶段框架，将以波长聚类为导向的预选波长步骤与基于包装的策略相结合。第一阶段在数据中执行剪枝过程，根据光谱聚类去除信息量较少的波长，这种技术被认为适用于手头的傅立叶变换红外（FTIR）光谱和近红外（NIR）光谱数据。根据不同波长重要性指数（即 Bhattacharyya 距离、Chi-square、ReliefF 和 Gini）和分类技术（即支持向量机、k-近邻和随机森林）的组合，对预选波长进行第二阶段的选择工作。当应用于来自不同领域的 11 个傅立叶变换红外数据集时，推荐的重要性指数和分类器组合将平均准确率提高了 6.37%（从 0.863 提高到 0.918），同时平均保留了 3.84% 的原始光谱。在计算时间方面，该框架还改进了选择过程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Chemometrics 化学-分析化学

CiteScore

5.20

自引率

8.30%

发文量

审稿时长

2 months

期刊介绍： The Journal of Chemometrics is devoted to the rapid publication of original scientific papers, reviews and short communications on fundamental and applied aspects of chemometrics. It also provides a forum for the exchange of information on meetings and other news relevant to the growing community of scientists who are interested in chemometrics and its applications. Short, critical review papers are a particularly important feature of the journal, in view of the multidisciplinary readership at which it is aimed.