{"title":"Enhancing elemental quantification in LIBS with SHAP-guided emission line analysis: A soil carbon study","authors":"Davi Keglevich Neiva , Wesley Nascimento Guedes , Ladislau Martin-Neto , Paulino Ribeiro Villas-Boas","doi":"10.1016/j.sab.2024.106971","DOIUrl":null,"url":null,"abstract":"<div><p>In laser-induced breakdown spectroscopy (LIBS), identifying key emission lines for accurate elemental quantification has long posed a challenge. Traditional methods rely on experimental knowledge, atomic databases, and intricate spectral analyses. Although machine learning techniques – such as boosting algorithms and neural networks – offer efficient processing for large datasets, the complexity of these techniques often compromises interpretability. To address this issue, our study integrates the SHapley Additive exPlanations (SHAP) algorithm with gradient boosting models in order to interpret the most important spectral features, thus enhancing our understanding of how specific emission lines contribute to the carbon (C) concentration predictions in soils. Deployed on a large dataset of 1019 soil samples, a wrapper method with a random forest regressor reduced the initial spectral intensity features from 13,748 to 1098. Subsequent application of a LightGBM regression model calibrated via the Optuna framework yielded – for training and validation sets, respectively – an <span><math><msup><mi>R</mi><mn>2</mn></msup></math></span> of 0.98 and 0.77, and RMSE values of 1.55 and 4.54 g kg<sup>−1</sup>. The SHAP summary plot showed that C emission lines influenced the model's predictions positively, as anticipated, whereas silicon (Si) emission lines produced a negative impact, suggesting a lower C concentration in sandy soils. Our findings not only validate the efficacy of SHAP in improving LIBS-based soil C quantification, but they also offer a sophisticated framework for decoding the complex interplay between emission lines and target elemental concentrations.</p></div>","PeriodicalId":21890,"journal":{"name":"Spectrochimica Acta Part B: Atomic Spectroscopy","volume":"217 ","pages":"Article 106971"},"PeriodicalIF":3.2000,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Spectrochimica Acta Part B: Atomic Spectroscopy","FirstCategoryId":"92","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0584854724001150","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SPECTROSCOPY","Score":null,"Total":0}
引用次数: 0
Abstract
In laser-induced breakdown spectroscopy (LIBS), identifying key emission lines for accurate elemental quantification has long posed a challenge. Traditional methods rely on experimental knowledge, atomic databases, and intricate spectral analyses. Although machine learning techniques – such as boosting algorithms and neural networks – offer efficient processing for large datasets, the complexity of these techniques often compromises interpretability. To address this issue, our study integrates the SHapley Additive exPlanations (SHAP) algorithm with gradient boosting models in order to interpret the most important spectral features, thus enhancing our understanding of how specific emission lines contribute to the carbon (C) concentration predictions in soils. Deployed on a large dataset of 1019 soil samples, a wrapper method with a random forest regressor reduced the initial spectral intensity features from 13,748 to 1098. Subsequent application of a LightGBM regression model calibrated via the Optuna framework yielded – for training and validation sets, respectively – an of 0.98 and 0.77, and RMSE values of 1.55 and 4.54 g kg−1. The SHAP summary plot showed that C emission lines influenced the model's predictions positively, as anticipated, whereas silicon (Si) emission lines produced a negative impact, suggesting a lower C concentration in sandy soils. Our findings not only validate the efficacy of SHAP in improving LIBS-based soil C quantification, but they also offer a sophisticated framework for decoding the complex interplay between emission lines and target elemental concentrations.
期刊介绍:
Spectrochimica Acta Part B: Atomic Spectroscopy, is intended for the rapid publication of both original work and reviews in the following fields:
Atomic Emission (AES), Atomic Absorption (AAS) and Atomic Fluorescence (AFS) spectroscopy;
Mass Spectrometry (MS) for inorganic analysis covering Spark Source (SS-MS), Inductively Coupled Plasma (ICP-MS), Glow Discharge (GD-MS), and Secondary Ion Mass Spectrometry (SIMS).
Laser induced atomic spectroscopy for inorganic analysis, including non-linear optical laser spectroscopy, covering Laser Enhanced Ionization (LEI), Laser Induced Fluorescence (LIF), Resonance Ionization Spectroscopy (RIS) and Resonance Ionization Mass Spectrometry (RIMS); Laser Induced Breakdown Spectroscopy (LIBS); Cavity Ringdown Spectroscopy (CRDS), Laser Ablation Inductively Coupled Plasma Atomic Emission Spectroscopy (LA-ICP-AES) and Laser Ablation Inductively Coupled Plasma Mass Spectrometry (LA-ICP-MS).
X-ray spectrometry, X-ray Optics and Microanalysis, including X-ray fluorescence spectrometry (XRF) and related techniques, in particular Total-reflection X-ray Fluorescence Spectrometry (TXRF), and Synchrotron Radiation-excited Total reflection XRF (SR-TXRF).
Manuscripts dealing with (i) fundamentals, (ii) methodology development, (iii)instrumentation, and (iv) applications, can be submitted for publication.