Pub Date : 2025-12-22DOI: 10.1016/j.chemolab.2025.105622
Leonardo J. Duarte , Gustavo G. Marcheafave , Elis D. Pauli , Ieda S. Scarminio , Roy E. Bruns
Two level factorial design spectra can be transformed into t spectral representations to analyze changes in metabolic abundances owing to environmental impacts. This transformation involves performing statistically paired t-test for each spectroscopic variable. These tests are sensitive to deviations from normality of the spectral data as well as heterogeneous variances of data at different factorial design levels. Although existing spectral information for metabolites can help guide interpretive efforts, permutation calculations can be performed to obtain statistical significance and t values of metabolic peaks that are expected to be less sensitive to these assumptions. The results of these calculations are reported here and compared with results from parametric statistical values for 13,501 NMR spectral variables for two level factorial design data of ethanol, dichloromethane and ethanol-dichloromethane (1:1) mixture extracts of yerba mate leaf samples. All t-representation peaks found to be statistically significant by parametric calculations are confirmed by the permutation calculations. Permutation results do not indicate any new significant peaks that were not predicted by the parametric results. As such, permutation calculations are recommended to validate results obtained from parametric determinations of statistical significance.
{"title":"Comparison of parametric and permutation t spectral representations for determining individual metabolite abundances from factorial design spectra","authors":"Leonardo J. Duarte , Gustavo G. Marcheafave , Elis D. Pauli , Ieda S. Scarminio , Roy E. Bruns","doi":"10.1016/j.chemolab.2025.105622","DOIUrl":"10.1016/j.chemolab.2025.105622","url":null,"abstract":"<div><div>Two level factorial design spectra can be transformed into t spectral representations to analyze changes in metabolic abundances owing to environmental impacts. This transformation involves performing statistically paired <em>t</em>-test for each spectroscopic variable. These tests are sensitive to deviations from normality of the spectral data as well as heterogeneous variances of data at different factorial design levels. Although existing spectral information for metabolites can help guide interpretive efforts, permutation calculations can be performed to obtain statistical significance and t values of metabolic peaks that are expected to be less sensitive to these assumptions. The results of these calculations are reported here and compared with results from parametric statistical values for 13,501 NMR spectral variables for two level factorial design data of ethanol, dichloromethane and ethanol-dichloromethane (1:1) mixture extracts of yerba mate leaf samples. All t-representation peaks found to be statistically significant by parametric calculations are confirmed by the permutation calculations. Permutation results do not indicate any new significant peaks that were not predicted by the parametric results. As such, permutation calculations are recommended to validate results obtained from parametric determinations of statistical significance.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105622"},"PeriodicalIF":3.8,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145836727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-18DOI: 10.1016/j.chemolab.2025.105621
Rodrigo Canarin de Oliveira , Hector Hernan Hernandez Zarta , Wargner Alonso Moreno Losada , Sebastián Javier Caruso , Hágata Cremasco , Evandro Bona , Douglas N. Rutledge , Diego Galvan
Soybean is a major global commodity. Given its importance, ensuring traceability becomes essential. Genetic, climatic, and soil-related factors influence its chemical composition. Integrating multi-source data using a multiblock analysis represents a powerful approach to differentiating soybeans and monitoring their traceability. This study employed an extension of the ComDim method (also known as Common Components and Specific Weights Analysis, CCSWA) to simultaneously differentiate 20 Brazilian soybean varieties, conventional and transgenic, based on cultivation region and cultivation type. The extension replaced the PCA (Principal Components Analysis) used in classical ComDim by CCA (Common Components Analysis). Forty samples cultivated in Londrina and Ponta Grossa (Paraná, Brazil) were analyzed for their fatty acid, amino acid, isoflavone, and mineral profiles using GC-FID, IEC, HPLC-DAD, and ICP-OES. The CCA-based ComDim results revealed that Common Component 2 (CC2) was primarily responsible for distinguishing the geographical regions of Londrina and Ponta Grossa. The global loadings of CC2 indicated that zinc (Zn), manganese (Mn), oleic acid, arginine, and malonyl genistin were the most influential variables in this component. In contrast, CC3 was associated with differentiating conventional and transgenic cultivars. The global loadings highlighted linoleic acid, oleic acid, α-linolenic acid, malonyl glycitin, malonyl genistin, Fe, Zn, and Mn as the most relevant contributors. The combined CC2 and CC3 plots indicated tendencies toward differentiation of soybean samples by cultivation region and cultivation type. This result highlights the potential of CCA-based ComDim as an effective tool for soybean traceability.
大豆是一种主要的全球商品。鉴于其重要性,确保可追溯性变得至关重要。遗传、气候和与土壤有关的因素影响其化学成分。使用多块分析集成多源数据是区分大豆和监测其可追溯性的有力方法。本研究采用ComDim方法(也称为Common Components and Specific Weights Analysis, CCSWA)的扩展方法,根据种植区域和种植类型同时区分了20个巴西大豆品种,包括常规大豆和转基因大豆。该扩展用CCA(公共成分分析)取代了经典ComDim中使用的PCA(主成分分析)。采用GC-FID、IEC、HPLC-DAD和ICP-OES分析了巴西Londrina和Ponta Grossa (paran)种植的40个样品的脂肪酸、氨基酸、异黄酮和矿物质谱。基于CC2的ComDim结果表明,共同成分2 (Common Component 2, CC2)是区分Londrina和Ponta Grossa地理区域的主要原因。CC2的全球负荷表明,锌(Zn)、锰(Mn)、油酸、精氨酸和丙二醇基genistin是影响该组分的主要变量。相比之下,CC3与常规和转基因品种的分化有关。亚油酸、油酸、α-亚麻酸、丙二醇甘油酯、丙二醇龙胆素、铁、锌和锰是最相关的贡献者。CC2和CC3联合样地显示了大豆样品按栽培区域和栽培类型分化的趋势。这一结果突出了基于ccm的ComDim作为大豆可追溯性的有效工具的潜力。
{"title":"A multi-source data integration for soybean differentiation through multiblock data analysis using a novel adaptation of ComDim","authors":"Rodrigo Canarin de Oliveira , Hector Hernan Hernandez Zarta , Wargner Alonso Moreno Losada , Sebastián Javier Caruso , Hágata Cremasco , Evandro Bona , Douglas N. Rutledge , Diego Galvan","doi":"10.1016/j.chemolab.2025.105621","DOIUrl":"10.1016/j.chemolab.2025.105621","url":null,"abstract":"<div><div>Soybean is a major global commodity. Given its importance, ensuring traceability becomes essential. Genetic, climatic, and soil-related factors influence its chemical composition. Integrating multi-source data using a multiblock analysis represents a powerful approach to differentiating soybeans and monitoring their traceability. This study employed an extension of the ComDim method (also known as Common Components and Specific Weights Analysis, CCSWA) to simultaneously differentiate 20 Brazilian soybean varieties, conventional and transgenic, based on cultivation region and cultivation type. The extension replaced the PCA (Principal Components Analysis) used in classical ComDim by CCA (Common Components Analysis). Forty samples cultivated in Londrina and Ponta Grossa (Paraná, Brazil) were analyzed for their fatty acid, amino acid, isoflavone, and mineral profiles using GC-FID, IEC, HPLC-DAD, and ICP-OES. The CCA-based ComDim results revealed that Common Component 2 (CC2) was primarily responsible for distinguishing the geographical regions of Londrina and Ponta Grossa. The global loadings of CC2 indicated that zinc (Zn), manganese (Mn), oleic acid, arginine, and malonyl genistin were the most influential variables in this component. In contrast, CC3 was associated with differentiating conventional and transgenic cultivars. The global loadings highlighted linoleic acid, oleic acid, α-linolenic acid, malonyl glycitin, malonyl genistin, Fe, Zn, and Mn as the most relevant contributors. The combined CC2 and CC3 plots indicated tendencies toward differentiation of soybean samples by cultivation region and cultivation type. This result highlights the potential of CCA-based ComDim as an effective tool for soybean traceability.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105621"},"PeriodicalIF":3.8,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145837308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-18DOI: 10.1016/j.chemolab.2025.105619
Mohamed El-dosuky , Aboul Ella Hassanien , Heba Alshater , Rania Ahmed , Sameh H. Basha , Heba AboulElla , Ashraf Darwish , Sara Abdelghafar
This paper introduces a neutrosophic deep learning model for automated detection and classification of palladium nanoparticles in scanning electron microscopy (SEM) images, distinguishing between ordered and disordered structures for accurate nanoparticle characterization. The model follows a five-phase pipeline for enhanced accuracy and efficiency. It begins with data augmentation, applying transformations like rotation and flipping to improve dataset diversity. The second phase uses neutrosophic image segmentation to manage uncertainty and noise in SEM images, allowing for the precise isolation of nanoparticle regions. In the third phase, the VGG-19 deep neural network extracts high-level features, initially identifying 25,088 features. In the fourth phase, a hybrid approach combining Gini importance and Genetic Optimized Rough Sets (GORS) reduces the number of features to 2454. The refined feature set is then classified using a Random Forest classifier, which effectively distinguishes between ordered and disordered palladium nanoparticles. To validate its performance, the proposed model was evaluated on a dataset of 1000 SEM images of carbon-based materials with deposited palladium nanoparticles, which was then expanded to 1500 images to address class imbalance and minimize overfitting. The experimental results highlight the model's strong potential as a high-performance classification tool for nanoparticle analysis in SEM images, achieving an overall accuracy of 99.67 %. To evaluate the impact of the introduced phases on the proposed model's performance, four ablation experiments were conducted, demonstrating the significance of each phase. Dropping data augmentation and feature reduction reduced accuracy approximately to 97.5 %, while dropping the feature extraction phase reduced it further to 94.17 %, highlighting the critical impact of these processes on performance and robustness.
{"title":"Detecting and classifying palladium nanoparticles in microscopic images using neutrosophic deep learning","authors":"Mohamed El-dosuky , Aboul Ella Hassanien , Heba Alshater , Rania Ahmed , Sameh H. Basha , Heba AboulElla , Ashraf Darwish , Sara Abdelghafar","doi":"10.1016/j.chemolab.2025.105619","DOIUrl":"10.1016/j.chemolab.2025.105619","url":null,"abstract":"<div><div>This paper introduces a neutrosophic deep learning model for automated detection and classification of palladium nanoparticles in scanning electron microscopy (SEM) images, distinguishing between ordered and disordered structures for accurate nanoparticle characterization. The model follows a five-phase pipeline for enhanced accuracy and efficiency. It begins with data augmentation, applying transformations like rotation and flipping to improve dataset diversity. The second phase uses neutrosophic image segmentation to manage uncertainty and noise in SEM images, allowing for the precise isolation of nanoparticle regions. In the third phase, the VGG-19 deep neural network extracts high-level features, initially identifying 25,088 features. In the fourth phase, a hybrid approach combining Gini importance and Genetic Optimized Rough Sets (GORS) reduces the number of features to 2454. The refined feature set is then classified using a Random Forest classifier, which effectively distinguishes between ordered and disordered palladium nanoparticles. To validate its performance, the proposed model was evaluated on a dataset of 1000 SEM images of carbon-based materials with deposited palladium nanoparticles, which was then expanded to 1500 images to address class imbalance and minimize overfitting. The experimental results highlight the model's strong potential as a high-performance classification tool for nanoparticle analysis in SEM images, achieving an overall accuracy of 99.67 %. To evaluate the impact of the introduced phases on the proposed model's performance, four ablation experiments were conducted, demonstrating the significance of each phase. Dropping data augmentation and feature reduction reduced accuracy approximately to 97.5 %, while dropping the feature extraction phase reduced it further to 94.17 %, highlighting the critical impact of these processes on performance and robustness.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105619"},"PeriodicalIF":3.8,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145787028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-17DOI: 10.1016/j.chemolab.2025.105620
Meixuan Zhao, Pengcheng Gu, Yuwang Han
Photoacoustic spectroscopy (PAS) is a powerful technique for detecting trace gas mixtures, with applications spanning industrial safety, environmental monitoring, and energy systems. However, when it is applied to three crucial indicator gases methane (CH4), ethane (C2H6), and ethylene (C2H4), strong spectral overlaps introduce cross-interference that complicates accurate concentration retrieval. To address limitations in conventional chemometric and machine learning approaches—such as poor generalization across concentration ranges and vulnerability to interference—this study proposes a hybrid model integrating Support Vector Machine (SVM) classification with Chaotic Particle Swarm Optimization (CPSO)-enhanced Kernel Extreme Learning Machine (KELM). The workflow includes wavelet-based denoising, feature selection via Competitive Adaptive Reweighted Sampling (CARS), dynamic thresholding by SVM to partition samples into high- and low-concentration regimes, and the eventual regression analysis using KELM. The proposed approach significantly improves detection accuracy across a wide concentration range (0.5–500 ppm). Experimental results show that the SVM-CPSO-KELM model achieves an average prediction error of 5.44 %, with maximum error below 14.37 %.
{"title":"A hybrid SVM-CPSO-KELM model for the simultaneous detection of methane, ethane, and ethylene via photoacoustic spectroscopy","authors":"Meixuan Zhao, Pengcheng Gu, Yuwang Han","doi":"10.1016/j.chemolab.2025.105620","DOIUrl":"10.1016/j.chemolab.2025.105620","url":null,"abstract":"<div><div>Photoacoustic spectroscopy (PAS) is a powerful technique for detecting trace gas mixtures, with applications spanning industrial safety, environmental monitoring, and energy systems. However, when it is applied to three crucial indicator gases methane (CH<sub>4</sub>), ethane (C<sub>2</sub>H<sub>6</sub>), and ethylene (C<sub>2</sub>H<sub>4</sub>), strong spectral overlaps introduce cross-interference that complicates accurate concentration retrieval. To address limitations in conventional chemometric and machine learning approaches—such as poor generalization across concentration ranges and vulnerability to interference—this study proposes a hybrid model integrating Support Vector Machine (SVM) classification with Chaotic Particle Swarm Optimization (CPSO)-enhanced Kernel Extreme Learning Machine (KELM). The workflow includes wavelet-based denoising, feature selection via Competitive Adaptive Reweighted Sampling (CARS), dynamic thresholding by SVM to partition samples into high- and low-concentration regimes, and the eventual regression analysis using KELM. The proposed approach significantly improves detection accuracy across a wide concentration range (0.5–500 ppm). Experimental results show that the SVM-CPSO-KELM model achieves an average prediction error of 5.44 %, with maximum error below 14.37 %.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105620"},"PeriodicalIF":3.8,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145787027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-17DOI: 10.1016/j.chemolab.2025.105617
Mohammed Saif Ismail Hameed, Robin van der Haar, Ying Chen, Peter Goos
When experimental tests differ in cost and the experiment is constrained by a fixed total budget, the optimal number of tests and the allocation between expensive and inexpensive tests cannot be determined a priori. We propose using a Variable Neighborhood Search (VNS) algorithm to generate optimal experimental designs for such problems. VNS is an intuitive and flexible metaheuristic that has been successfully applied to a wide range of optimization problems. We illustrate the effectiveness of the VNS algorithm by generating improved designs for a micronization experiment.
{"title":"Optimal design of experiments when not every test is equally expensive","authors":"Mohammed Saif Ismail Hameed, Robin van der Haar, Ying Chen, Peter Goos","doi":"10.1016/j.chemolab.2025.105617","DOIUrl":"10.1016/j.chemolab.2025.105617","url":null,"abstract":"<div><div>When experimental tests differ in cost and the experiment is constrained by a fixed total budget, the optimal number of tests and the allocation between expensive and inexpensive tests cannot be determined a priori. We propose using a Variable Neighborhood Search (VNS) algorithm to generate optimal experimental designs for such problems. VNS is an intuitive and flexible metaheuristic that has been successfully applied to a wide range of optimization problems. We illustrate the effectiveness of the VNS algorithm by generating improved designs for a micronization experiment.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105617"},"PeriodicalIF":3.8,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145836845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-17DOI: 10.1016/j.chemolab.2025.105614
Haoran Li , Xin Zhang , Pengchegn Wu , Yang Zhang , Jiyong Shi , Xiaobo Zou
Advances in spectral techniques have generated high-resolution data with thousands of variables. Although an increasing number of variables provides more comprehensive molecular information, it also brings more challenges for existing chemometrics methods, such as the risk of over-fitting and the lack of interpretability. Therefore, we propose a hybrid variable selection approach specifically designed for large-scale datasets. First, considering the continuous characteristics of spectral variables and their importance, interval partial least squares (iPLS) and variable combination population analysis (VCPA) were applied to select relevant variables while reducing the variable space. Second, we consider that truly relevant variables exhibit consistent importance across the sample domain for the same analytical tasks and are therefore more likely to be selected and retained. Consequently, a cross-domain constrained ensemble (CCE) strategy is developed using the least absolute shrinkage and selection operator (LASSO) to further enhance the performance of variable selection. Experiments on wine H NMR and pork Raman spectroscopy datasets demonstrate that the proposed method improves prediction performance in terms of RMSEP and RPD. In addition, the proposed CCE method demonstrates superior prediction improvement performance over other final selection methods. These results confirm the effectiveness of both the hybrid variable selection framework and the CCE strategy in handling large-scale spectral datasets.
{"title":"A hybrid variable selection with cross-domain constrained ensemble (CCE) for large-scale spectroscopic data","authors":"Haoran Li , Xin Zhang , Pengchegn Wu , Yang Zhang , Jiyong Shi , Xiaobo Zou","doi":"10.1016/j.chemolab.2025.105614","DOIUrl":"10.1016/j.chemolab.2025.105614","url":null,"abstract":"<div><div>Advances in spectral techniques have generated high-resolution data with thousands of variables. Although an increasing number of variables provides more comprehensive molecular information, it also brings more challenges for existing chemometrics methods, such as the risk of over-fitting and the lack of interpretability. Therefore, we propose a hybrid variable selection approach specifically designed for large-scale datasets. First, considering the continuous characteristics of spectral variables and their importance, interval partial least squares (iPLS) and variable combination population analysis (VCPA) were applied to select relevant variables while reducing the variable space. Second, we consider that truly relevant variables exhibit consistent importance across the sample domain for the same analytical tasks and are therefore more likely to be selected and retained. Consequently, a cross-domain constrained ensemble (CCE) strategy is developed using the least absolute shrinkage and selection operator (LASSO) to further enhance the performance of variable selection. Experiments on wine <span><math><msup><mrow></mrow><mrow><mn>1</mn></mrow></msup></math></span>H NMR and pork Raman spectroscopy datasets demonstrate that the proposed method improves prediction performance in terms of RMSEP and RPD. In addition, the proposed CCE method demonstrates superior prediction improvement performance over other final selection methods. These results confirm the effectiveness of both the hybrid variable selection framework and the CCE strategy in handling large-scale spectral datasets.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105614"},"PeriodicalIF":3.8,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145787026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-15DOI: 10.1016/j.chemolab.2025.105618
Ieda S. Scarminio , Roy E. Bruns
A short history of the beginning of chemometric activities in Brazil as well as early international interactions are presented. Details of early research efforts on main frame computers, 8-bit microcomputers and the first 16-bit microcomputers are detailed. A very brief discussion of the rapid growth of chemometrics in Brazil as the result of readily available software is given.
{"title":"Chemometrics in Brazil: The early days","authors":"Ieda S. Scarminio , Roy E. Bruns","doi":"10.1016/j.chemolab.2025.105618","DOIUrl":"10.1016/j.chemolab.2025.105618","url":null,"abstract":"<div><div>A short history of the beginning of chemometric activities in Brazil as well as early international interactions are presented. Details of early research efforts on main frame computers, 8-bit microcomputers and the first 16-bit microcomputers are detailed. A very brief discussion of the rapid growth of chemometrics in Brazil as the result of readily available software is given.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105618"},"PeriodicalIF":3.8,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145787023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-12DOI: 10.1016/j.chemolab.2025.105615
Yinran Xiong , Jie Tang , Guangming Qiu , Peng Wang , Yuncan Chen , Jing Jing , Lijun Zhu
Agricultural products often exhibit substantial batch-to-batch variability in their chemical and physical properties due to environmental and other uncontrollable factors, making robust quality monitoring essential for ensuring product consistency and stability. Near-infrared (NIR) spectroscopy offers rich chemical and physical information for qualitative quality assessment, but its high dimensionality and the scarcity of abnormal samples, since non-conforming products are not intentionally manufactured, limit the applicability of conventional supervised learning approaches. To address these challenges, this study proposes Covariance-Shrunk Slow Feature Analysis (CSSFA), a novel unsupervised learning method that integrates covariance shrinkage into the Slow Feature Analysis (SFA) framework. CSSFA mitigates estimation bias in high-dimensional settings and improves the robustness and interpretability of extracted features. Experiments on two NIR tobacco datasets demonstrate that CSSFA effectively captures features related to product quality stability and achieves accurate anomaly detection without requiring large numbers of abnormal samples. This work provides a scalable and interpretable strategy for anomaly detection of agricultural products using NIR spectroscopy with abnormal samples which are rare or unavailable.
{"title":"An unsupervised approach to anomaly detection in near-infrared spectroscopy via Covariance-Shrunk Slow Feature Analysis","authors":"Yinran Xiong , Jie Tang , Guangming Qiu , Peng Wang , Yuncan Chen , Jing Jing , Lijun Zhu","doi":"10.1016/j.chemolab.2025.105615","DOIUrl":"10.1016/j.chemolab.2025.105615","url":null,"abstract":"<div><div>Agricultural products often exhibit substantial batch-to-batch variability in their chemical and physical properties due to environmental and other uncontrollable factors, making robust quality monitoring essential for ensuring product consistency and stability. Near-infrared (NIR) spectroscopy offers rich chemical and physical information for qualitative quality assessment, but its high dimensionality and the scarcity of abnormal samples, since non-conforming products are not intentionally manufactured, limit the applicability of conventional supervised learning approaches. To address these challenges, this study proposes Covariance-Shrunk Slow Feature Analysis (CSSFA), a novel unsupervised learning method that integrates covariance shrinkage into the Slow Feature Analysis (SFA) framework. CSSFA mitigates estimation bias in high-dimensional settings and improves the robustness and interpretability of extracted features. Experiments on two NIR tobacco datasets demonstrate that CSSFA effectively captures features related to product quality stability and achieves accurate anomaly detection without requiring large numbers of abnormal samples. This work provides a scalable and interpretable strategy for anomaly detection of agricultural products using NIR spectroscopy with abnormal samples which are rare or unavailable.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105615"},"PeriodicalIF":3.8,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145787024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-12DOI: 10.1016/j.chemolab.2025.105616
Luiz Renato Rosa Leme de Souza, Carlos Alberto Rios, Márcia Cristina Breitkreitz
Context:
Python is a widely-known open-source and robust programming language used in many research fields. Artificial Intelligence (AI) is a growing tool with many applications that is capable of helping with long and difficult tasks. Routines for preprocessing spectra signals and applying chemometric models are usually part of expensive software. Despite the existence of isolated code snippets, libraries, and tutorials, it is a hard task to find an open-access routine that guides from the raw Raman mapping data set to the clear chemical information contained within the analyzed samples by means of chemical maps.
Objectives:
This paper presents an AI-assisted Python-based routine for preprocessing Raman mapping results and generating chemical maps of samples using the chemometric methods: CLS, PLS and PCA, with the goal of providing an open access routine for research purposes.
Methods:
Python programming language and AI tools were used as code generators, translators, and debugging tools to assist the creation of the routine, and the results were compared to the ones obtained by a Matlab routine.
Results:
The Python routine successfully performed the preprocessing of the Raman spectra and the calculations of the chemometric methods CLS, PLS and PCA generating chemical maps. The results were equivalent to those of Matlab for the same data set, leading to the same conclusions.
Conclusion:
This paper demonstrated the application of an open access Python-based AI-guided routine to preprocess and generate chemical maps applying CLS, PCA and PLS models, now available and editable to suit different needs.
{"title":"Raman mapping and Chemometrics: An open access Python-based routine to preprocess and generate chemical maps applying CLS, PCA and PLS methods","authors":"Luiz Renato Rosa Leme de Souza, Carlos Alberto Rios, Márcia Cristina Breitkreitz","doi":"10.1016/j.chemolab.2025.105616","DOIUrl":"10.1016/j.chemolab.2025.105616","url":null,"abstract":"<div><h3>Context:</h3><div>Python is a widely-known open-source and robust programming language used in many research fields. Artificial Intelligence (AI) is a growing tool with many applications that is capable of helping with long and difficult tasks. Routines for preprocessing spectra signals and applying chemometric models are usually part of expensive software. Despite the existence of isolated code snippets, libraries, and tutorials, it is a hard task to find an open-access routine that guides from the raw Raman mapping data set to the clear chemical information contained within the analyzed samples by means of chemical maps.</div></div><div><h3>Objectives:</h3><div>This paper presents an AI-assisted Python-based routine for preprocessing Raman mapping results and generating chemical maps of samples using the chemometric methods: CLS, PLS and PCA, with the goal of providing an open access routine for research purposes.</div></div><div><h3>Methods:</h3><div>Python programming language and AI tools were used as code generators, translators, and debugging tools to assist the creation of the routine, and the results were compared to the ones obtained by a Matlab routine.</div></div><div><h3>Results:</h3><div>The Python routine successfully performed the preprocessing of the Raman spectra and the calculations of the chemometric methods CLS, PLS and PCA generating chemical maps. The results were equivalent to those of Matlab for the same data set, leading to the same conclusions.</div></div><div><h3>Conclusion:</h3><div>This paper demonstrated the application of an open access Python-based AI-guided routine to preprocess and generate chemical maps applying CLS, PCA and PLS models, now available and editable to suit different needs.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105616"},"PeriodicalIF":3.8,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145787025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-08DOI: 10.1016/j.chemolab.2025.105606
Sergej Papoci, Manuel Jiménez, Michele Ghidotti, María Beatriz de la Calle Guntiñas
The willingness of consumers to pay higher prices for high quality specialties, such as Darjeeling tea, goes hand in hand with an increase of fraudulent practices in which Darjeeling tea is substituted totally or partially by cheaper teas. Currently, to evaluate the percentage of substitution that a method can detect, Darjeeling tea is mixed in different proportions with non-Darjeeling teas, and after homogenisation the mixture is analysed. This time-consuming approach implies the use of valuable amounts of sample and, therefore an alternative approach is needed. Here a method is described to calculate the minimum detectable substitution percentage of Darjeeling tea by other teas without needing to prepare real mixtures. The approach is based on the use of virtual mixtures made with the results obtained for commercially available Darjeeling and non-Darjeeling teas. The method used for authentication purposes, made use of the elemental profiles of tea obtained by Energy Dispersive X-ray Fluorescence, combined with chemometrics and modelling by Partial Least Square-Discriminant Analysis. The false positives percentage at different substitution levels, was evaluated and compared with the results obtained with real mixtures of Darjeeling and non-Darjeeling teas. Comparable results were obtained with both approaches. Twenty percent was the lowest substitution level that could be detected with an acceptable sensitivity (94 %) and specificity (86 %). A fast, easy to implement approach has been developed and validated, to calculate the minimum substitution percentage that can be detected by an authentication analytical method, without the need to carry out additional laboratory experiments.
{"title":"Evaluation of a mathematical approach to detect fraudulent substitution of Darjeeling tea with other types of tea using the elemental profiles obtained by Energy Dispersive X-ray Fluorescence","authors":"Sergej Papoci, Manuel Jiménez, Michele Ghidotti, María Beatriz de la Calle Guntiñas","doi":"10.1016/j.chemolab.2025.105606","DOIUrl":"10.1016/j.chemolab.2025.105606","url":null,"abstract":"<div><div>The willingness of consumers to pay higher prices for high quality specialties, such as Darjeeling tea, goes hand in hand with an increase of fraudulent practices in which Darjeeling tea is substituted totally or partially by cheaper teas. Currently, to evaluate the percentage of substitution that a method can detect, Darjeeling tea is mixed in different proportions with non-Darjeeling teas, and after homogenisation the mixture is analysed. This time-consuming approach implies the use of valuable amounts of sample and, therefore an alternative approach is needed. Here a method is described to calculate the minimum detectable substitution percentage of Darjeeling tea by other teas without needing to prepare real mixtures. The approach is based on the use of virtual mixtures made with the results obtained for commercially available Darjeeling and non-Darjeeling teas. The method used for authentication purposes, made use of the elemental profiles of tea obtained by Energy Dispersive X-ray Fluorescence, combined with chemometrics and modelling by Partial Least Square-Discriminant Analysis. The false positives percentage at different substitution levels, was evaluated and compared with the results obtained with real mixtures of Darjeeling and non-Darjeeling teas. Comparable results were obtained with both approaches. Twenty percent was the lowest substitution level that could be detected with an acceptable sensitivity (94 %) and specificity (86 %). A fast, easy to implement approach has been developed and validated, to calculate the minimum substitution percentage that can be detected by an authentication analytical method, without the need to carry out additional laboratory experiments.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105606"},"PeriodicalIF":3.8,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145733576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}