Pub Date : 2025-02-07DOI: 10.1016/j.chemolab.2025.105341
Andreas Kartakoullis , Nicola Caporaso , Martin B. Whitworth , Ian D. Fisk
In this study, an ad-hoc image processing pipeline has been developed and proposed for the purpose of semantically segmenting wheat kernel data acquired through near-infrared hyperspectral imaging (HSI). The Gaussian Mixture Model (GMM), characterized as a soft clustering method, has been employed for this task, yielding noteworthy results in both kernel and germ segmentation. A comparative analysis was conducted, wherein GMM was compared with two hard clustering methods, hierarchical clustering and k-means, as well as other common clustering algorithms prevalent in food HSI applications. Notably, GMM exhibited the highest accuracy, with a Jaccard index of 0.745, surpassing hierarchical clustering at 0.698 and k-means at 0.652. Furthermore, the spectral variations observed in wheat kernel topology can be used for semantic image segmentation, especially in the context of selecting the germ portion within the wheat kernels. These findings carry practical significance for professionals in the fields of hyperspectral imaging (HSI) and machine vision, particularly for food product quality assessment and real-time inspection.
{"title":"Gaussian mixture model clustering allows accurate semantic image segmentation of wheat kernels from near-infrared hyperspectral images","authors":"Andreas Kartakoullis , Nicola Caporaso , Martin B. Whitworth , Ian D. Fisk","doi":"10.1016/j.chemolab.2025.105341","DOIUrl":"10.1016/j.chemolab.2025.105341","url":null,"abstract":"<div><div>In this study, an ad-hoc image processing pipeline has been developed and proposed for the purpose of semantically segmenting wheat kernel data acquired through near-infrared hyperspectral imaging (HSI). The Gaussian Mixture Model (GMM), characterized as a soft clustering method, has been employed for this task, yielding noteworthy results in both kernel and germ segmentation. A comparative analysis was conducted, wherein GMM was compared with two hard clustering methods, hierarchical clustering and k-means, as well as other common clustering algorithms prevalent in food HSI applications. Notably, GMM exhibited the highest accuracy, with a Jaccard index of 0.745, surpassing hierarchical clustering at 0.698 and k-means at 0.652. Furthermore, the spectral variations observed in wheat kernel topology can be used for semantic image segmentation, especially in the context of selecting the germ portion within the wheat kernels. These findings carry practical significance for professionals in the fields of hyperspectral imaging (HSI) and machine vision, particularly for food product quality assessment and real-time inspection.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"259 ","pages":"Article 105341"},"PeriodicalIF":3.7,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143421366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-04DOI: 10.1016/j.chemolab.2025.105342
Yujia Dai , Qing Ma , Tingsong Zhang , Shangyong Zhao , Lu Zhou , Xun Gao , Ziyuan Liu
Laser-Induced Breakdown Spectroscopy (LIBS), combined with modern machine learning tools, has emerged as a powerful technique for metal material identification, leveraging its high sensitivity and rapid response. However, the current spectral data analysis methods typically involve a two-step process of dimensionality reduction and model learning, lacking seamless integration. In this study, we address this issue by investigating a discriminative learning approach based on LIBS, utilizing the Discriminative Restricted Boltzmann Machine (DRBM). We apply LIBS technology in conjunction with DRBM for spectral feature selection and classification of five distinct small-sample aluminum alloy samples. The learned spectral latent distribution from the generative model component of DRBM effectively regularizes the discriminative process, thereby overcoming the problem of training overfitting arising from the high-dimensional small-sample limitation. This results in a stable and generalizable qualitative analysis model independent of empirical knowledge. The approach presented in this study achieves a 100 % accuracy, surpassing the best-performing traditional machine learning method (PCA-RF) by 13.33 % in accuracy and demonstrating a similar improvement compared to a Backpropagation Neural Network (BPNN) with the same structure.
{"title":"Classification of aluminum alloy using laser-induced breakdown spectroscopy combined with discriminative restricted Boltzmann machine","authors":"Yujia Dai , Qing Ma , Tingsong Zhang , Shangyong Zhao , Lu Zhou , Xun Gao , Ziyuan Liu","doi":"10.1016/j.chemolab.2025.105342","DOIUrl":"10.1016/j.chemolab.2025.105342","url":null,"abstract":"<div><div>Laser-Induced Breakdown Spectroscopy (LIBS), combined with modern machine learning tools, has emerged as a powerful technique for metal material identification, leveraging its high sensitivity and rapid response. However, the current spectral data analysis methods typically involve a two-step process of dimensionality reduction and model learning, lacking seamless integration. In this study, we address this issue by investigating a discriminative learning approach based on LIBS, utilizing the Discriminative Restricted Boltzmann Machine (DRBM). We apply LIBS technology in conjunction with DRBM for spectral feature selection and classification of five distinct small-sample aluminum alloy samples. The learned spectral latent distribution from the generative model component of DRBM effectively regularizes the discriminative process, thereby overcoming the problem of training overfitting arising from the high-dimensional small-sample limitation. This results in a stable and generalizable qualitative analysis model independent of empirical knowledge. The approach presented in this study achieves a 100 % accuracy, surpassing the best-performing traditional machine learning method (PCA-RF) by 13.33 % in accuracy and demonstrating a similar improvement compared to a Backpropagation Neural Network (BPNN) with the same structure.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"258 ","pages":"Article 105342"},"PeriodicalIF":3.7,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143349115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-04DOI: 10.1016/j.chemolab.2025.105339
Joan Borràs-Ferrís , Carl Duchesne , Alberto Ferrer
We present a novel Latent Space-based Multivariate Capability Index (LSb-MCpk) aligned with the Quality by Design initiative and used as a criterion for ranking and selecting suppliers for a particular raw material used in a manufacturing process. The novelty of this new index is that, contrary to other multivariate capability indexes that are defined either in the raw material space or in the Critical Quality Attributes (CQAs) space of the product manufactured, this new LSb-MCpk is defined in the latent space connecting both spaces. This endows the new index with a clear advantage over classical ones as it quantifies the capacity of each raw material supplier of providing assurance of quality with a certain confidence level for the CQAs of the manufactured product before manufacturing a single unit of the product. All we need is a rich database with historical information of several raw material properties along with the CQAs. Besides, we present a novel methodology to carry out the diagnosis for assignable causes when a supplier does not score a good capability index. The proposed LSb-MCpk is based on Partial Least Squares (PLS) regression, and it is illustrated using data from both an industrial and a simulation study.
{"title":"A latent space-based multivariate capability index: A new paradigm for raw material supplier selection in industry 4.0","authors":"Joan Borràs-Ferrís , Carl Duchesne , Alberto Ferrer","doi":"10.1016/j.chemolab.2025.105339","DOIUrl":"10.1016/j.chemolab.2025.105339","url":null,"abstract":"<div><div>We present a novel Latent Space-based Multivariate Capability Index (<em>LSb-MC</em><sub><em>pk</em></sub>) aligned with the Quality by Design initiative and used as a criterion for ranking and selecting suppliers for a particular raw material used in a manufacturing process. The novelty of this new index is that, contrary to other multivariate capability indexes that are defined either in the raw material space or in the Critical Quality Attributes (CQAs) space of the product manufactured, this new <em>LSb-MC</em><sub><em>pk</em></sub> is defined in the latent space connecting both spaces. This endows the new index with a clear advantage over classical ones as it quantifies the capacity of each raw material supplier of providing assurance of quality with a certain confidence level for the CQAs of the manufactured product before manufacturing a single unit of the product. All we need is a rich database with historical information of several raw material properties along with the CQAs. Besides, we present a novel methodology to carry out the diagnosis for assignable causes when a supplier does not score a good capability index. The proposed <em>LSb-MC</em><sub><em>pk</em></sub> is based on Partial Least Squares (PLS) regression, and it is illustrated using data from both an industrial and a simulation study.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"258 ","pages":"Article 105339"},"PeriodicalIF":3.7,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143386824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-02DOI: 10.1016/j.chemolab.2025.105336
Maddina Dinesh Kumar , Dharmaiah Gurram , Se-Jin Yook , C.S.K. Raju , Nehad Ali Shah
Research background and significance
Hybrid nanofluids have garnered significant attention because of their capacity to enhance heat transmission in a range of technical applications; optimising their thermal performance is crucial for improving the efficiency of cooling systems, energy storage devices, and heat exchangers with rotating surfaces.
Present study novelty and methodology
In a present study investigating the heat, velocity and mass diffusion transformation under the effect of the Rossland and magnetic approximations, a ternary hybrid nanofluid is a mixing of more than two characteristics using a base fluid through a spinning disc surface, utilising to speed up the heat transmission rate due to ternary hybrid nanofluid, converting non-linear PDE to ODE in this process dimensional governing equations will convert to dimensionless by using the similarity transformations afterwards with MATLAB inbuilt BVP5C solver has been using for the numeral computation, The quadratic regression model's response surface method (RSM) has been employed to research the impacts of independent parameters on physical parameters; surface plots are drawn through Python programming.
Quantitative evaluation
For the RSM quadratic regression model , it shows the model fit goodness. case-1 including more rate of transmission than case-2, In case-1 with more transmission rate in comparison to case-2, In case-1 Possessing more rate of transmission than case 2.
{"title":"Optimising thermal performance of water-based hybrid nanofluids with magnetic and radiative effects over a spinning disc","authors":"Maddina Dinesh Kumar , Dharmaiah Gurram , Se-Jin Yook , C.S.K. Raju , Nehad Ali Shah","doi":"10.1016/j.chemolab.2025.105336","DOIUrl":"10.1016/j.chemolab.2025.105336","url":null,"abstract":"<div><h3>Research background and significance</h3><div>Hybrid nanofluids have garnered significant attention because of their capacity to enhance heat transmission in a range of technical applications; optimising their thermal performance is crucial for improving the efficiency of cooling systems, energy storage devices, and heat exchangers with rotating surfaces.</div></div><div><h3>Present study novelty and methodology</h3><div>In a present study investigating the heat, velocity and mass diffusion transformation under the effect of the Rossland and magnetic approximations, a ternary hybrid nanofluid is a mixing of more than two characteristics using a base fluid through a spinning disc surface, utilising to speed up the heat transmission rate due to ternary hybrid nanofluid, converting non-linear PDE to ODE in this process dimensional governing equations will convert to dimensionless by using the similarity transformations afterwards with MATLAB inbuilt BVP5C solver has been using for the numeral computation, The quadratic regression model's response surface method (RSM) has been employed to research the impacts of independent parameters on physical parameters; surface plots are drawn through Python programming.</div></div><div><h3>Quantitative evaluation</h3><div>For the RSM quadratic regression model <span><math><mrow><mo>(</mo><mrow><msup><mi>R</mi><mn>2</mn></msup><mo>=</mo><mn>99.51</mn><mo>%</mo></mrow><mo>)</mo></mrow></math></span>, it shows the model fit goodness. case-1 including more <span><math><mrow><msub><mi>C</mi><mi>f</mi></msub></mrow></math></span> rate of transmission than case-2, In case-1 with more <span><math><mrow><mi>S</mi><mi>h</mi></mrow></math></span> transmission rate in comparison to case-2, In case-1 Possessing more <span><math><mrow><mi>N</mi><mi>u</mi><mi>s</mi></mrow></math></span> rate of transmission than case 2.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"258 ","pages":"Article 105336"},"PeriodicalIF":3.7,"publicationDate":"2025-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143183730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-31DOI: 10.1016/j.chemolab.2025.105322
Karol I. Santoro , Yolanda M. Gómez , Héctor J. Gómez , Diego I. Gallardo
In this paper, we introduce a new class of unit models defined on the open unit interval. Through the reparameterization of the model, the location parameter can be interpreted as a quantile of the distribution. Furthermore, we can assess the impact of explanatory variables within the conditional quantiles of the dependent variable, offering an alternative to the Kumaraswamy quantile regression model. We engage in quantile regression and apply it to two instances of environmental data. We evaluate the effectiveness of the newly introduced models in scenarios both with and without covariates, drawing comparisons with results yielded by the Kumaraswamy regression model. The proposed method has been implemented in an R package.
{"title":"A new class of unit models with a quantile regression approach applied to contamination data","authors":"Karol I. Santoro , Yolanda M. Gómez , Héctor J. Gómez , Diego I. Gallardo","doi":"10.1016/j.chemolab.2025.105322","DOIUrl":"10.1016/j.chemolab.2025.105322","url":null,"abstract":"<div><div>In this paper, we introduce a new class of unit models defined on the open unit interval. Through the reparameterization of the model, the location parameter can be interpreted as a quantile of the distribution. Furthermore, we can assess the impact of explanatory variables within the conditional quantiles of the dependent variable, offering an alternative to the Kumaraswamy quantile regression model. We engage in quantile regression and apply it to two instances of environmental data. We evaluate the effectiveness of the newly introduced models in scenarios both with and without covariates, drawing comparisons with results yielded by the Kumaraswamy regression model. The proposed method has been implemented in an R package.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"258 ","pages":"Article 105322"},"PeriodicalIF":3.7,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143349659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-27DOI: 10.1016/j.chemolab.2025.105335
Bader Huwaimel , Saad Alqarni
Poly (lactic-co-glycolic acid) (PLGA) is one of the most commonly used polymers for drug delivery due to its biodegradable property. Production of PLGA particles in nanosized scale would be of great importance to exploit the properties of this polymer for nano-based drug delivery. This work explores machine learning methods for the PLGA regression tasks of particle size (nm) prediction and Zeta potential (mV) in the synthesis process. Utilizing a comprehensive dataset with categorical inputs (PLGA type and anti-solvent type) and numerical inputs (PLGA concentration and anti-solvent concentration), the research incorporates Isolation Forest for outlier detection, Min-Max Normalization, and One-Hot Encoding for preprocessing. Several regression models including LASSO, Polynomial Regression (PR), and Support Vector Regression (SVR) were employed in combination with Bagging Ensemble methods for enhanced predictive performance. Glowworm Swarm Optimization (GSO) was applied for hyperparameter tuning. The results indicate that BAG-SVR attained the highest test R2 of 0.9422 for particle size prediction. For Zeta potential prediction, BAG-PR outperformed other models, achieving a test R2 score of 0.98881.
{"title":"Design of Poly(lactic-co-glycolic acid) nanoparticles in drug delivery by artificial intelligence methods to find the conditions of nanoparticles synthesis","authors":"Bader Huwaimel , Saad Alqarni","doi":"10.1016/j.chemolab.2025.105335","DOIUrl":"10.1016/j.chemolab.2025.105335","url":null,"abstract":"<div><div>Poly (lactic-co-glycolic acid) (PLGA) is one of the most commonly used polymers for drug delivery due to its biodegradable property. Production of PLGA particles in nanosized scale would be of great importance to exploit the properties of this polymer for nano-based drug delivery. This work explores machine learning methods for the PLGA regression tasks of particle size (nm) prediction and Zeta potential (mV) in the synthesis process. Utilizing a comprehensive dataset with categorical inputs (PLGA type and anti-solvent type) and numerical inputs (PLGA concentration and anti-solvent concentration), the research incorporates Isolation Forest for outlier detection, Min-Max Normalization, and One-Hot Encoding for preprocessing. Several regression models including LASSO, Polynomial Regression (PR), and Support Vector Regression (SVR) were employed in combination with Bagging Ensemble methods for enhanced predictive performance. Glowworm Swarm Optimization (GSO) was applied for hyperparameter tuning. The results indicate that BAG-SVR attained the highest test R<sup>2</sup> of 0.9422 for particle size prediction. For Zeta potential prediction, BAG-PR outperformed other models, achieving a test R<sup>2</sup> score of 0.98881.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"258 ","pages":"Article 105335"},"PeriodicalIF":3.7,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143183733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-25DOI: 10.1016/j.chemolab.2025.105334
M.A. Meneses-Nava
This study introduces a spectral analysis method known as Boosted Deconvolution Fitting (BDF) to process spectroscopic data. The BDF method enhances spectral resolution and precisely adjusts spectra by integrating boosted deconvolution for determining band profile parameters, and a multicomponent analysis technique for minor adjustments in band intensity. This technique seeks to address the shortcomings of conventional methods like the Levenberg-Marquardt algorithm (LMA), especially in terms of improving spectral resolution, accurately determining parameters of overlapping bands, and reducing sensitivity to initial conditions. The efficacy of the BDF method is affected by various factors, including the chosen band profile type (Gaussian or Lorentzian), the signal-to-noise ratio (SNR) of the dataset, and the separation and relative intensities of the spectral bands.
{"title":"Automatic spectral fitting for LIBS and Raman spectra by boosted deconvolution method","authors":"M.A. Meneses-Nava","doi":"10.1016/j.chemolab.2025.105334","DOIUrl":"10.1016/j.chemolab.2025.105334","url":null,"abstract":"<div><div>This study introduces a spectral analysis method known as Boosted Deconvolution Fitting (BDF) to process spectroscopic data. The BDF method enhances spectral resolution and precisely adjusts spectra by integrating boosted deconvolution for determining band profile parameters, and a multicomponent analysis technique for minor adjustments in band intensity. This technique seeks to address the shortcomings of conventional methods like the Levenberg-Marquardt algorithm (LMA), especially in terms of improving spectral resolution, accurately determining parameters of overlapping bands, and reducing sensitivity to initial conditions. The efficacy of the BDF method is affected by various factors, including the chosen band profile type (Gaussian or Lorentzian), the signal-to-noise ratio (SNR) of the dataset, and the separation and relative intensities of the spectral bands.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"258 ","pages":"Article 105334"},"PeriodicalIF":3.7,"publicationDate":"2025-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143183653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-24DOI: 10.1016/j.chemolab.2025.105333
Min-Hsu Tai, Cheng-Che Hsu
This study presents the development of a generative adversarial network (GAN) to generate high-resolution (HR) spectra from low-resolution (LR) spectra. Plasma emissions with second positive system of nitrogen are used for demonstration. Specair™ is used to generate HR and LR spectra pairs as the training data covering the range of rotational temperatures (Trot) and vibrational temperatures (Tvib) ranging from 300 to 1200 K and 2000 to 6500 K, respectively. Optical emission spectra from low-pressure and atmospheric-pressure plasmas are used as the testing data to show the feasibility of the model for generating HR spectra with spectra acquired using LR spectrometers. Feature matching is used during the training stage to tackle the instability issues. The distributions of the discriminator scores are used as an initial criterion to monitor the training procedure. The results show a weighted coefficient of determination () greater than 0.9999 between the simulated and generated HR spectra. The fitting errors for Trot and Tvib between generated HR spectra and experimental HR spectra acquired from an HR spectrometer are mostly below 5 %. The results indicate that this GAN serves as an efficient approach to obtain HR spectra when HR spectrometers are not available.
本研究介绍了生成对抗网络(GAN)的开发情况,该网络可从低分辨率(LR)光谱生成高分辨率(HR)光谱。等离子体发射的第二正氮系统被用于演示。使用 Specair™ 生成 HR 和 LR 光谱对作为训练数据,涵盖的旋转温度 (Trot) 和振动温度 (Tvib) 范围分别为 300 至 1200 K 和 2000 至 6500 K。低压和大气压等离子体的光学发射光谱被用作测试数据,以显示该模型利用 LR 光谱仪获取的光谱生成 HR 光谱的可行性。在训练阶段使用特征匹配来解决不稳定性问题。判别分数的分布被用作监测训练过程的初始标准。结果显示,模拟和生成的 HR 光谱之间的加权判定系数 (R‾2) 大于 0.9999。生成的心率频谱与从心率频谱仪获取的实验心率频谱之间的 Trot 和 Tvib 拟合误差大多低于 5%。结果表明,在没有 HR 光谱仪的情况下,该 GAN 是获取 HR 光谱的有效方法。
{"title":"Reconstructing spectral shapes with GAN models: A data-driven approach for high-resolution spectra from low-resolution spectrometers","authors":"Min-Hsu Tai, Cheng-Che Hsu","doi":"10.1016/j.chemolab.2025.105333","DOIUrl":"10.1016/j.chemolab.2025.105333","url":null,"abstract":"<div><div>This study presents the development of a generative adversarial network (GAN) to generate high-resolution (HR) spectra from low-resolution (LR) spectra. Plasma emissions with second positive system of nitrogen are used for demonstration. Specair™ is used to generate HR and LR spectra pairs as the training data covering the range of rotational temperatures (T<sub>rot</sub>) and vibrational temperatures (T<sub>vib</sub>) ranging from 300 to 1200 K and 2000 to 6500 K, respectively. Optical emission spectra from low-pressure and atmospheric-pressure plasmas are used as the testing data to show the feasibility of the model for generating HR spectra with spectra acquired using LR spectrometers. Feature matching is used during the training stage to tackle the instability issues. The distributions of the discriminator scores are used as an initial criterion to monitor the training procedure. The results show a weighted coefficient of determination (<span><math><mrow><msup><mover><mi>R</mi><mo>‾</mo></mover><mn>2</mn></msup></mrow></math></span>) greater than 0.9999 between the simulated and generated HR spectra. The fitting errors for T<sub>rot</sub> and T<sub>vib</sub> between generated HR spectra and experimental HR spectra acquired from an HR spectrometer are mostly below 5 %. The results indicate that this GAN serves as an efficient approach to obtain HR spectra when HR spectrometers are not available.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"258 ","pages":"Article 105333"},"PeriodicalIF":3.7,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143183721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-23DOI: 10.1016/j.chemolab.2025.105324
Zhaoxuan Pan , Xiaoyu Zhao , Yue Zhao , Lijing Cai , Liang Tong , Zhe Zhai
The Competitive Adaptive Re-weighted Sampling (CARS) method, while excelling in feature extraction, encounters several challenges when processing low-quality data, including high computational complexity, intricate parameter settings, and the potential for overfitting. To address these issues, this paper introduces the IWCARS (Initial Weight and Weight, I & W) algorithm, which implements two key methodological enhancements: initial weight selection and weight update strategy. This algorithm, building upon the traditional CARS algorithm and density-based clustering, offers a supplementary tool for data feature selection by computing density and weight, and employs an adaptive model evaluation mechanism to select the most pertinent features, ultimately constructing a model with enhanced predictive capability. IWCARS optimizes model performance by dynamically adjusting the feature set, thereby improving the algorithm's prediction performance and model fit. Furthermore, the IWCARS method, in conjunction with a Partial Least Squares (PLS) model, was applied to measure soil Available Potassium (AK) content using near-infrared spectroscopy. Five pre-processing techniques were conducted on the near-infrared spectrum, with the IWCARS + PLS model constructed using first derivative data, yielding optimal results. The experimental results demonstrated that the model based on 1st Derivative + IWCARS + PLS yielded the best performance. Specifically, the model achieved RC2 of 0.9905, Rp2 of 0.9817, RMSEC of 0.8917, RMSEP of 0.9024, and RPD of 8.5176. Robustness, versatility, and transferability tests demonstrated that the proposed IWCARS algorithm, when integrated into the PLS model, achieved commendable measurement accuracy. While there are limited strategies for concurrently addressing high computational complexity, challenging parameter settings, and overfitting risks, this study aims to mitigate these concerns by reducing the computational complexity of the CARS algorithm, simplifying parameter settings, and preventing overfitting, ultimately enhancing the model's fitting accuracy, training speed, and generalization capability.
竞争自适应重加权采样(CARS)方法虽然在特征提取方面表现出色,但在处理低质量数据时遇到了一些挑战,包括高计算复杂度,复杂的参数设置以及过度拟合的可能性。为了解决这些问题,本文引入了IWCARS (Initial Weight和Weight, I &;该算法实现了两个关键的方法改进:初始权值选择和权值更新策略。该算法在传统CARS算法和基于密度的聚类的基础上,通过计算密度和权值为数据特征选择提供补充工具,并采用自适应模型评价机制选择最相关的特征,最终构建具有增强预测能力的模型。IWCARS通过动态调整特征集来优化模型性能,从而提高算法的预测性能和模型拟合。此外,采用IWCARS方法,结合偏最小二乘(PLS)模型,利用近红外光谱技术测定土壤速效钾(AK)含量。对近红外光谱进行了5种预处理技术,利用一阶导数数据构建了IWCARS + PLS模型,得到了最优的预处理结果。实验结果表明,基于一阶导数+ IWCARS + PLS的模型性能最好。具体而言,模型的RC2为0.9905,Rp2为0.9817,RMSEC为0.8917,RMSEP为0.9024,RPD为8.5176。鲁棒性、通用性和可转移性测试表明,当集成到PLS模型中时,所提出的IWCARS算法实现了值得称赞的测量精度。同时解决高计算复杂度、具有挑战性的参数设置和过拟合风险的策略有限,本研究旨在通过降低CARS算法的计算复杂度、简化参数设置和防止过拟合来缓解这些问题,最终提高模型的拟合精度、训练速度和泛化能力。
{"title":"An enhanced IWCARS method for measuring soil available potassium","authors":"Zhaoxuan Pan , Xiaoyu Zhao , Yue Zhao , Lijing Cai , Liang Tong , Zhe Zhai","doi":"10.1016/j.chemolab.2025.105324","DOIUrl":"10.1016/j.chemolab.2025.105324","url":null,"abstract":"<div><div>The Competitive Adaptive Re-weighted Sampling (CARS) method, while excelling in feature extraction, encounters several challenges when processing low-quality data, including high computational complexity, intricate parameter settings, and the potential for overfitting. To address these issues, this paper introduces the IWCARS (Initial Weight and Weight, I & W) algorithm, which implements two key methodological enhancements: initial weight selection and weight update strategy. This algorithm, building upon the traditional CARS algorithm and density-based clustering, offers a supplementary tool for data feature selection by computing density and weight, and employs an adaptive model evaluation mechanism to select the most pertinent features, ultimately constructing a model with enhanced predictive capability. IWCARS optimizes model performance by dynamically adjusting the feature set, thereby improving the algorithm's prediction performance and model fit. Furthermore, the IWCARS method, in conjunction with a Partial Least Squares (PLS) model, was applied to measure soil Available Potassium (AK) content using near-infrared spectroscopy. Five pre-processing techniques were conducted on the near-infrared spectrum, with the IWCARS + PLS model constructed using first derivative data, yielding optimal results. The experimental results demonstrated that the model based on 1st Derivative + IWCARS + PLS yielded the best performance. Specifically, the model achieved R<sub>C</sub><sup>2</sup> of 0.9905, R<sub>p</sub><sup>2</sup> of 0.9817, RMSEC of 0.8917, RMSEP of 0.9024, and RPD of 8.5176. Robustness, versatility, and transferability tests demonstrated that the proposed IWCARS algorithm, when integrated into the PLS model, achieved commendable measurement accuracy. While there are limited strategies for concurrently addressing high computational complexity, challenging parameter settings, and overfitting risks, this study aims to mitigate these concerns by reducing the computational complexity of the CARS algorithm, simplifying parameter settings, and preventing overfitting, ultimately enhancing the model's fitting accuracy, training speed, and generalization capability.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"258 ","pages":"Article 105324"},"PeriodicalIF":3.7,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143183732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-21DOI: 10.1016/j.chemolab.2025.105323
Christophe Bajan, Guillaume Lambard
We present MADGUI, Multi-Application Design Graphical User Interface (GUI) using Bayesian Optimization and prediction model for data analysis and optimize process or composition. Its strength is its user-friendly design, which requires no programming knowledge. It is built using the Streamlit library in Python and is divided into three parts, allowing users to select various parameters and fill csv/xlsx files without any coding required. Overall, MADGUI is designed as an optimal experiment design platform with active machine learning, which accelerates the discovery of optimal solutions and provides an intuitive GUI for users with no experience in coding, machine learning, or optimization.
{"title":"MADGUI: Multi-Application Design Graphical User Interface for active learning assisted by Bayesian optimization","authors":"Christophe Bajan, Guillaume Lambard","doi":"10.1016/j.chemolab.2025.105323","DOIUrl":"10.1016/j.chemolab.2025.105323","url":null,"abstract":"<div><div>We present MADGUI, Multi-Application Design Graphical User Interface (GUI) using Bayesian Optimization and prediction model for data analysis and optimize process or composition. Its strength is its user-friendly design, which requires no programming knowledge. It is built using the Streamlit library in Python and is divided into three parts, allowing users to select various parameters and fill csv/xlsx files without any coding required. Overall, MADGUI is designed as an optimal experiment design platform with active machine learning, which accelerates the discovery of optimal solutions and provides an intuitive GUI for users with no experience in coding, machine learning, or optimization.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"258 ","pages":"Article 105323"},"PeriodicalIF":3.7,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143183720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}