Wavelength selection method for near-infrared spectroscopy based on the combination of mutual information and genetic algorithm.

IF 5.6 1区 化学 Q1 CHEMISTRY, ANALYTICAL Talanta Pub Date : 2025-05-01 Epub Date: 2025-01-10 DOI:10.1016/j.talanta.2025.127573
Xiao-Hui Ma, Zheng-Guang Chen, Shuo Liu, Jin-Ming Liu, Xue-Song Tian
{"title":"Wavelength selection method for near-infrared spectroscopy based on the combination of mutual information and genetic algorithm.","authors":"Xiao-Hui Ma, Zheng-Guang Chen, Shuo Liu, Jin-Ming Liu, Xue-Song Tian","doi":"10.1016/j.talanta.2025.127573","DOIUrl":null,"url":null,"abstract":"<p><p>Near-infrared (NIR) spectroscopy analysis technology has become a widely utilized analytical tool in various fields due to its convenience and efficiency. However, with the promotion of instrument precision, the spectral dimension can now be expanded to include hundreds of dimensions. This expansion results in time-consuming modeling processes and a decrease in model performance. Hence, it is crucial to carefully choose representative features before constructing models. This paper focuses on the limitations of filter algorithms, which can only sort features and cannot directly determine the best subset of features. A hybrid method of combination of the Max-Relevance Min-Redundancy (mRMR) algorithm and the Genetic Algorithm (GA), as well as filter and wrapper feature selection methods, are combined to select appropriate features automatically. This hybrid algorithm retains the features in each individual that are considered to have a strong correlation and low redundancy by the mRMR algorithms during each iteration of the GA. On the other hand, it deletes the features that are regarded as having little correlation or high redundancy. Through the process of iteration, the feature subset is continuously optimized. We use the proposed hybrid method to select features on two datasets and establish various models to verify our proposed method in this paper. The experimental results indicate the feature selection approach, which combines mRMR with the GA, covers the advantages of both feature selection methods. This approach can select features that show good predictive performance. When compared with other common feature selection methods, such as the Uninformative Variable Elimination algorithm (UVE), Competitive Adaptive Reweighted Sampling algorithm (CARS), Successive Projections Algorithm (SPA), Iteratively Retains Informative Variables (IRIV), and GA, the hybrid algorithm can select a larger number of feature variables that are both representative and informative, additionally, it significantly enhances the predictive performance of the model.</p>","PeriodicalId":435,"journal":{"name":"Talanta","volume":"286 ","pages":"127573"},"PeriodicalIF":5.6000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Talanta","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1016/j.talanta.2025.127573","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/10 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Near-infrared (NIR) spectroscopy analysis technology has become a widely utilized analytical tool in various fields due to its convenience and efficiency. However, with the promotion of instrument precision, the spectral dimension can now be expanded to include hundreds of dimensions. This expansion results in time-consuming modeling processes and a decrease in model performance. Hence, it is crucial to carefully choose representative features before constructing models. This paper focuses on the limitations of filter algorithms, which can only sort features and cannot directly determine the best subset of features. A hybrid method of combination of the Max-Relevance Min-Redundancy (mRMR) algorithm and the Genetic Algorithm (GA), as well as filter and wrapper feature selection methods, are combined to select appropriate features automatically. This hybrid algorithm retains the features in each individual that are considered to have a strong correlation and low redundancy by the mRMR algorithms during each iteration of the GA. On the other hand, it deletes the features that are regarded as having little correlation or high redundancy. Through the process of iteration, the feature subset is continuously optimized. We use the proposed hybrid method to select features on two datasets and establish various models to verify our proposed method in this paper. The experimental results indicate the feature selection approach, which combines mRMR with the GA, covers the advantages of both feature selection methods. This approach can select features that show good predictive performance. When compared with other common feature selection methods, such as the Uninformative Variable Elimination algorithm (UVE), Competitive Adaptive Reweighted Sampling algorithm (CARS), Successive Projections Algorithm (SPA), Iteratively Retains Informative Variables (IRIV), and GA, the hybrid algorithm can select a larger number of feature variables that are both representative and informative, additionally, it significantly enhances the predictive performance of the model.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于互信息和遗传算法的近红外光谱波长选择方法。
近红外(NIR)光谱分析技术因其便捷、高效的特点,已成为各领域广泛使用的分析工具。然而,随着仪器精度的提高,光谱维度现在可以扩展到数百个维度。这种扩展导致建模过程耗时,模型性能下降。因此,在构建模型之前仔细选择具有代表性的特征至关重要。过滤算法只能对特征进行排序,不能直接确定最佳特征子集,本文重点讨论过滤算法的局限性。本文结合了最大相关性最小冗余(mRMR)算法和遗传算法(GA)的混合方法,以及过滤器和包装特征选择方法,来自动选择合适的特征。在遗传算法的每次迭代中,这种混合算法保留了 mRMR 算法认为每个个体中相关性强、冗余度低的特征。另一方面,它删除了被认为相关性小或冗余度高的特征。通过迭代过程,特征子集不断得到优化。本文使用所提出的混合方法在两个数据集上选择特征,并建立各种模型来验证我们所提出的方法。实验结果表明,mRMR 与 GA 结合的特征选择方法涵盖了两种特征选择方法的优点。这种方法可以选择出具有良好预测性能的特征。与其他常见的特征选择方法,如无信息变量消除算法(UVE)、竞争性自适应重加权采样算法(CARS)、连续投影算法(SPA)、迭代保留有信息变量算法(IRIV)和 GA 相比,混合算法可以选择更多既有代表性又有信息量的特征变量,而且还能显著提高模型的预测性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Talanta
Talanta 化学-分析化学
CiteScore
12.30
自引率
4.90%
发文量
861
审稿时长
29 days
期刊介绍: Talanta provides a forum for the publication of original research papers, short communications, and critical reviews in all branches of pure and applied analytical chemistry. Papers are evaluated based on established guidelines, including the fundamental nature of the study, scientific novelty, substantial improvement or advantage over existing technology or methods, and demonstrated analytical applicability. Original research papers on fundamental studies, and on novel sensor and instrumentation developments, are encouraged. Novel or improved applications in areas such as clinical and biological chemistry, environmental analysis, geochemistry, materials science and engineering, and analytical platforms for omics development are welcome. Analytical performance of methods should be determined, including interference and matrix effects, and methods should be validated by comparison with a standard method, or analysis of a certified reference material. Simple spiking recoveries may not be sufficient. The developed method should especially comprise information on selectivity, sensitivity, detection limits, accuracy, and reliability. However, applying official validation or robustness studies to a routine method or technique does not necessarily constitute novelty. Proper statistical treatment of the data should be provided. Relevant literature should be cited, including related publications by the authors, and authors should discuss how their proposed methodology compares with previously reported methods.
期刊最新文献
Cascade-amplification-based electrochemical detection of Akashiwo sanguinea at pre-outbreak stage. Fully printed field-effect transistor humidity sensor with chitosan/polyvinyl alcohol/nano carbon powder for enhanced moisture sensitivity. Lab-created conductive filament based on nickel and graphite particles: An attractive material for the additive manufacture of enhanced electrochemical sensors for non-enzymatic and selective glucose sensing. Near-field and far-field competitive couplings between plasmonic nanodisk array and nanoparticles for rapid and facile heparin assay. An ultrasensitive homogeneous electrochemical strategy for ochratoxin a sensing based on nanoscale PCN-224@MB@Apt.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1