Pub Date : 2026-01-05DOI: 10.1016/j.chemolab.2026.105629
Ouguan Xu , Zeyu Yang , Zhiqiang Ge
Distributed principal component analysis (PCA) has been widely used for monitoring large-scale industrial processes in the past years, with lots of improved forms and extension counterparts. This paper introduces a deep residual form of PCA into the distributed modeling framework, in order to improve the monitoring performance for large-scale industrial processes. While deep residual PCA model is developed for feature engineering in each separated block of the process, those augmented features extracted in different blocks are combined together in the second level for construction of an additional deep residual PCA model. By further augmenting the extracted features from different layers of the deep residual model, the final process monitoring scheme can be formulated for large-scale industrial processes. Based on two industrial case studies, the monitoring performance has been improved more than 20 % by the proposed distributed deep learning model, while at the same time the computation burden of the new method has been kept in a low level.
{"title":"Distributed learning of deep residual principal component analysis for large-scale industrial process monitoring","authors":"Ouguan Xu , Zeyu Yang , Zhiqiang Ge","doi":"10.1016/j.chemolab.2026.105629","DOIUrl":"10.1016/j.chemolab.2026.105629","url":null,"abstract":"<div><div>Distributed principal component analysis (PCA) has been widely used for monitoring large-scale industrial processes in the past years, with lots of improved forms and extension counterparts. This paper introduces a deep residual form of PCA into the distributed modeling framework, in order to improve the monitoring performance for large-scale industrial processes. While deep residual PCA model is developed for feature engineering in each separated block of the process, those augmented features extracted in different blocks are combined together in the second level for construction of an additional deep residual PCA model. By further augmenting the extracted features from different layers of the deep residual model, the final process monitoring scheme can be formulated for large-scale industrial processes. Based on two industrial case studies, the monitoring performance has been improved more than 20 % by the proposed distributed deep learning model, while at the same time the computation burden of the new method has been kept in a low level.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105629"},"PeriodicalIF":3.8,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145920786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-05DOI: 10.1016/j.chemolab.2025.105626
Aylin Alin
We propose a robust penalized smooth partial least squares approach that (i) smooths high-dimensional discretized functional predictors via blockwise B-spline bases, (ii) applies robust SIMPLS to obtain latent scores, (iii) fits a penalized regression in the latent space whose penalty is exactly a block-diagonal roughness penalty on the coefficient function(s), and (iv) aggregates models through bootstrap ensembles (classical/sufficient resampling; mean/median aggregation). The method supports multiple functional predictors through a block-diagonal construction and yields interpretable smooth coefficient functions. Our method demonstrates competitive or superior performance under collinearity and contamination.
{"title":"Ensemble robust SIMPLS with block-penalized smoothing for scalar-on-function regression","authors":"Aylin Alin","doi":"10.1016/j.chemolab.2025.105626","DOIUrl":"10.1016/j.chemolab.2025.105626","url":null,"abstract":"<div><div>We propose a robust penalized smooth partial least squares approach that (i) smooths high-dimensional discretized functional predictors via blockwise B-spline bases, (ii) applies robust SIMPLS to obtain latent scores, (iii) fits a penalized regression in the latent space whose penalty is exactly a block-diagonal roughness penalty on the coefficient function(s), and (iv) aggregates models through bootstrap ensembles (classical/sufficient resampling; mean/median aggregation). The method supports multiple functional predictors through a block-diagonal construction and yields interpretable smooth coefficient functions. Our method demonstrates competitive or superior performance under collinearity and contamination.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105626"},"PeriodicalIF":3.8,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145920785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-02DOI: 10.1016/j.chemolab.2025.105623
Xiaogang Deng , Ziheng Wang , Lumeng Huang , Ping Wang
Deep learning neural networks have been widely adopted for developing quality prediction models in industrial processes. Despite their strong capability of nonlinear intrinsic features, the existing models have some notable drawbacks, such as insufficient capturing of local spatio-temporal features, high computational complexity of model training, difficult determination of deep model structure, and lack of model interpretability. To address these issues, this paper presents an efficient automated deep spatio-temporal feature learning framework for dynamic industrial process soft sensing, named Deep Convolutional Partial Least Squares (DeCPLS). The proposed approach introduces the convolutional Partial Least squares (CPLS) model as a basic feature extraction unit and stacks multiple CPLS layers to construct an efficient deep dynamic feature learning model. A layerwise training mechanism is presented to facilitate the automated determination of model structures and hyperparameters, thereby reducing the computational complexity. Furthermore, a model prediction error explanation mechanism is introduced to analyze prediction outcomes effectively. Compared to classical deep neural networks, the proposed method demonstrates the advantage of efficiently capturing local spatio-temporal features while maintaining acceptable computational complexity. Finally, the superiority of the proposed method is validated through a simulated industrial case study and a real-world industrial application.
{"title":"An efficient automated deep spatio-temporal feature learning framework for industrial soft sensing","authors":"Xiaogang Deng , Ziheng Wang , Lumeng Huang , Ping Wang","doi":"10.1016/j.chemolab.2025.105623","DOIUrl":"10.1016/j.chemolab.2025.105623","url":null,"abstract":"<div><div>Deep learning neural networks have been widely adopted for developing quality prediction models in industrial processes. Despite their strong capability of nonlinear intrinsic features, the existing models have some notable drawbacks, such as insufficient capturing of local spatio-temporal features, high computational complexity of model training, difficult determination of deep model structure, and lack of model interpretability. To address these issues, this paper presents an efficient automated deep spatio-temporal feature learning framework for dynamic industrial process soft sensing, named Deep Convolutional Partial Least Squares (DeCPLS). The proposed approach introduces the convolutional Partial Least squares (CPLS) model as a basic feature extraction unit and stacks multiple CPLS layers to construct an efficient deep dynamic feature learning model. A layerwise training mechanism is presented to facilitate the automated determination of model structures and hyperparameters, thereby reducing the computational complexity. Furthermore, a model prediction error explanation mechanism is introduced to analyze prediction outcomes effectively. Compared to classical deep neural networks, the proposed method demonstrates the advantage of efficiently capturing local spatio-temporal features while maintaining acceptable computational complexity. Finally, the superiority of the proposed method is validated through a simulated industrial case study and a real-world industrial application.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105623"},"PeriodicalIF":3.8,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145920783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-02DOI: 10.1016/j.chemolab.2025.105625
Sughra Sarwar, Tahir Mehmood, Mudassir Iqbal
Fourier Transform Infrared (FTIR) spectroscopy enables the rapid and non-destructive examination of complex materials; however, challenges associated with its high dimensional data include dimensionality, noise, and outliers. Conventional regression methods, such as Multivariate Adaptive Regression Splines (MARS), often struggle with these problems, which frequently involve small sample sizes and numerous variables, particularly in chemometric studies. This study proposed a hybrid framework that combines MARS with Principal Component Analysis (PCA) and Kernel PCA (KPCA) and linear regression to overcome these drawbacks. FTIR was used to investigate 30 honey samples from various geographical areas in Pakistan. Before modeling, outlier detection was conducted using the Mahalanobis distance, computed in both PCA and KPCA-transformed spaces, with Minimum Covariance Determinant (MCD) estimators applied to identify and remove statistical outliers. This preprocessing step ensured that anomalous samples did not influence the model’s accuracy. The model consistently identified chemically relevant wave numbers, especially in the high-energy C–H, O–H, and N–H stretching zones (e.g., 2966, 3010, and 3584 cm). These wavenumbers correspond to important functional groups involved in the absorption and bioaccumulation of pollutants in honey. To assess model performance, we used a 70:30 train-test split. The MARS-PCA-LR model, with RMSE 1.2905, MAE 1.2725 and MSE 1.2887, outperformed the normal MARS baseline (RMSE 4.5860, MAE 3.4267 and MSE 21.0319) and the MARS-KPCA-LR model (RMSE 1.5017, MAE 1.3300 and MSE 1.5013) in terms of prediction accuracy. These results suggest that the proposed MARS-PCA-LR and MARS-KPCA-LR models offer improved interpretability and robustness, making them strong and reliable techniques for analyzing high-dimensional spectral data.
{"title":"A novel statistical framework to address FTIR spectral challenges: Hybrid MARS–PCA/KPCA models for pollutants analysis in honey samples","authors":"Sughra Sarwar, Tahir Mehmood, Mudassir Iqbal","doi":"10.1016/j.chemolab.2025.105625","DOIUrl":"10.1016/j.chemolab.2025.105625","url":null,"abstract":"<div><div>Fourier Transform Infrared (FTIR) spectroscopy enables the rapid and non-destructive examination of complex materials; however, challenges associated with its high dimensional data include dimensionality, noise, and outliers. Conventional regression methods, such as Multivariate Adaptive Regression Splines (MARS), often struggle with these problems, which frequently involve small sample sizes and numerous variables, particularly in chemometric studies. This study proposed a hybrid framework that combines MARS with Principal Component Analysis (PCA) and Kernel PCA (KPCA) and linear regression to overcome these drawbacks. FTIR was used to investigate 30 honey samples from various geographical areas in Pakistan. Before modeling, outlier detection was conducted using the Mahalanobis distance, computed in both PCA and KPCA-transformed spaces, with Minimum Covariance Determinant (MCD) estimators applied to identify and remove statistical outliers. This preprocessing step ensured that anomalous samples did not influence the model’s accuracy. The model consistently identified chemically relevant wave numbers, especially in the high-energy C–H, O–H, and N–H stretching zones (e.g., 2966, 3010, and 3584 cm<span><math><msup><mrow></mrow><mrow><mo>−</mo><mn>1</mn></mrow></msup></math></span>). These wavenumbers correspond to important functional groups involved in the absorption and bioaccumulation of pollutants in honey. To assess model performance, we used a 70:30 train-test split. The MARS-PCA-LR model, with RMSE <span><math><mo>=</mo></math></span> 1.2905, MAE <span><math><mo>=</mo></math></span> 1.2725 and MSE <span><math><mo>=</mo></math></span> 1.2887, outperformed the normal MARS baseline (RMSE <span><math><mo>=</mo></math></span> 4.5860, MAE <span><math><mo>=</mo></math></span> 3.4267 and MSE <span><math><mo>=</mo></math></span> 21.0319) and the MARS-KPCA-LR model (RMSE <span><math><mo>=</mo></math></span> 1.5017, MAE <span><math><mo>=</mo></math></span> 1.3300 and MSE <span><math><mo>=</mo></math></span> 1.5013) in terms of prediction accuracy. These results suggest that the proposed MARS-PCA-LR and MARS-KPCA-LR models offer improved interpretability and robustness, making them strong and reliable techniques for analyzing high-dimensional spectral data.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105625"},"PeriodicalIF":3.8,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145880509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-30DOI: 10.1016/j.chemolab.2025.105627
Zichuan Bu , Jihong Liu , Jiageng Zhang , Chi Liu , Yihua Liu , Kaili Ren , Xuewen Yan , Wei Gao , Jun Dong
Raman spectroscopy is a pivotal tool in analytical and physical chemistry, yet its application in complex systems is hindered by spectral superposition and analysis challenges. The development of deep learning technology has provided new ideas for the component analysis of complex mixtures. This study proposes a mixture component identification method named MCI, which is based on the masked autoencoder and convolutional neural network. The aim is to effectively solve the problems of qualitative recognition and quantitative analysis in the Raman spectra of mixtures. The MCI method adopts a multi-stage framework: First, the Voigt function is used to accurately extract the characteristic peaks of the mixture. Second, the MAE model is employed to reconstruct the corresponding pure-substance spectra. Then, the CNN model is combined to conduct qualitative and quantitative analyses on the reconstructed spectra. Finally, the spectrum of the remaining components is obtained by subtracting the reconstructed spectrum from the mixture spectrum. By iterating the above process, the step-by-step unmixing of complex mixtures is achieved. In the generated mixed sample test data, the MCI outperforms the other three comparative models in terms of complete recognition accuracy in qualitative analysis and the evaluation indicators of each substance, while maintaining a lower average concentration error in quantitative analysis. Moreover, for complex mixtures containing interfering substances, the MCI shows strong anti-interference ability and maintains a high Identification accuracy. In the actual measurement of mixed sample Raman spectral identification detection, The MCI model achieved an average accuracy and F1_Score of 97 % in all test samples, further verifying its reliability and practicality in detecting the main components of real and complex mixtures. In summary, this study provides a new technical method for Raman spectral analysis of complex mixtures, which holds certain theoretical significance and practical value.
{"title":"Deep learning-driven components analysis of Raman spectral mixtures: An integrated masked autoencoder with convolutional neural network approach","authors":"Zichuan Bu , Jihong Liu , Jiageng Zhang , Chi Liu , Yihua Liu , Kaili Ren , Xuewen Yan , Wei Gao , Jun Dong","doi":"10.1016/j.chemolab.2025.105627","DOIUrl":"10.1016/j.chemolab.2025.105627","url":null,"abstract":"<div><div>Raman spectroscopy is a pivotal tool in analytical and physical chemistry, yet its application in complex systems is hindered by spectral superposition and analysis challenges. The development of deep learning technology has provided new ideas for the component analysis of complex mixtures. This study proposes a mixture component identification method named MCI, which is based on the masked autoencoder and convolutional neural network. The aim is to effectively solve the problems of qualitative recognition and quantitative analysis in the Raman spectra of mixtures. The MCI method adopts a multi-stage framework: First, the Voigt function is used to accurately extract the characteristic peaks of the mixture. Second, the MAE model is employed to reconstruct the corresponding pure-substance spectra. Then, the CNN model is combined to conduct qualitative and quantitative analyses on the reconstructed spectra. Finally, the spectrum of the remaining components is obtained by subtracting the reconstructed spectrum from the mixture spectrum. By iterating the above process, the step-by-step unmixing of complex mixtures is achieved. In the generated mixed sample test data, the MCI outperforms the other three comparative models in terms of complete recognition accuracy in qualitative analysis and the evaluation indicators of each substance, while maintaining a lower average concentration error in quantitative analysis. Moreover, for complex mixtures containing interfering substances, the MCI shows strong anti-interference ability and maintains a high Identification accuracy. In the actual measurement of mixed sample Raman spectral identification detection, The MCI model achieved an average accuracy and <em>F1_Score</em> of 97 % in all test samples, further verifying its reliability and practicality in detecting the main components of real and complex mixtures. In summary, this study provides a new technical method for Raman spectral analysis of complex mixtures, which holds certain theoretical significance and practical value.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105627"},"PeriodicalIF":3.8,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145880468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nowadays, the large number of measurable variables has considerably increased the complexity of data. In the framework of the decision-making process, this leads to the need of adequate tools to set priorities and rank the available options. Ordering is one of the possible ways to analyse multivariate data, which provides an overview of the relationships among the elements of a system. The Multi-Criteria Decision Making (MCDM) encompasses a broad set of methods designed to set priority-based lists of alternatives based on multiple criteria, which support decision problems. Among the most widely adopted techniques, TOPSIS, dominance-based approaches, the Analytic Hierarchy Process (AHP), and Copeland scores represent some of the classical methodologies in both theoretical research and applied decision analysis.
Among the dominance-based approaches, an effective MCDM method is the Power-Weakness Ratio (PWR), which generates a tournament table (i.e., the pairwise comparison matrix) from a data matrix with a varying number of samples (i.e., alternatives to be compared) and variables (i.e., the criteria for pairwise comparisons), weighted according to their relative importance in determining the final ranking. In this study, a variant of the classical Power-Weakness Ratio is presented, significantly modifying the way the tournament table is obtained. The method, called smoothed Power-Weakness Ratio (sPWR), takes into account the dominance degree of the alternatives in each pairwise comparison exploiting the differences between the criterion values. The rationale behind the method is described by the aid of an illustrative example on a simple benchmark dataset with known reference ranking of the samples. The main advantage of the new method over PWR is that its tournament table is much more informative and sensitive to the original data values than the classical pairwise comparison matrix. A multivariate comparison with other classical MCDM methods, performed on several diverse datasets, demonstrated that the results obtained by sPWR were quite similar to those obtained by Copeland Score and TOPSIS with range scaling. However, sPWR showed a higher tendency toward generating full rankings with an enhanced ability to remove ties in the pairwise comparisons.
{"title":"Smoothed Power-Weakness Ratio (sPWR): a new informative system for multi-criteria decision making","authors":"Viviana Consonni, Davide Ballabio, Enmanuel Cruz Muñoz, Veronica Termopoli, Roberto Todeschini","doi":"10.1016/j.chemolab.2025.105624","DOIUrl":"10.1016/j.chemolab.2025.105624","url":null,"abstract":"<div><div>Nowadays, the large number of measurable variables has considerably increased the complexity of data. In the framework of the decision-making process, this leads to the need of adequate tools to set priorities and rank the available options. Ordering is one of the possible ways to analyse multivariate data, which provides an overview of the relationships among the elements of a system. The Multi-Criteria Decision Making (MCDM) encompasses a broad set of methods designed to set priority-based lists of alternatives based on multiple criteria, which support decision problems. Among the most widely adopted techniques, TOPSIS, dominance-based approaches, the Analytic Hierarchy Process (AHP), and Copeland scores represent some of the classical methodologies in both theoretical research and applied decision analysis.</div><div>Among the dominance-based approaches, an effective MCDM method is the Power-Weakness Ratio (PWR), which generates a tournament table (i.e., the pairwise comparison matrix) from a data matrix with a varying number of samples (i.e., alternatives to be compared) and variables (i.e., the criteria for pairwise comparisons), weighted according to their relative importance in determining the final ranking. In this study, a variant of the classical Power-Weakness Ratio is presented, significantly modifying the way the tournament table is obtained. The method, called smoothed Power-Weakness Ratio (sPWR), takes into account the dominance degree of the alternatives in each pairwise comparison exploiting the differences between the criterion values. The rationale behind the method is described by the aid of an illustrative example on a simple benchmark dataset with known reference ranking of the samples. The main advantage of the new method over PWR is that its tournament table is much more informative and sensitive to the original data values than the classical pairwise comparison matrix. A multivariate comparison with other classical MCDM methods, performed on several diverse datasets, demonstrated that the results obtained by sPWR were quite similar to those obtained by Copeland Score and TOPSIS with range scaling. However, sPWR showed a higher tendency toward generating full rankings with an enhanced ability to remove ties in the pairwise comparisons.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"270 ","pages":"Article 105624"},"PeriodicalIF":3.8,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146075322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-22DOI: 10.1016/j.chemolab.2025.105622
Leonardo J. Duarte , Gustavo G. Marcheafave , Elis D. Pauli , Ieda S. Scarminio , Roy E. Bruns
Two level factorial design spectra can be transformed into t spectral representations to analyze changes in metabolic abundances owing to environmental impacts. This transformation involves performing statistically paired t-test for each spectroscopic variable. These tests are sensitive to deviations from normality of the spectral data as well as heterogeneous variances of data at different factorial design levels. Although existing spectral information for metabolites can help guide interpretive efforts, permutation calculations can be performed to obtain statistical significance and t values of metabolic peaks that are expected to be less sensitive to these assumptions. The results of these calculations are reported here and compared with results from parametric statistical values for 13,501 NMR spectral variables for two level factorial design data of ethanol, dichloromethane and ethanol-dichloromethane (1:1) mixture extracts of yerba mate leaf samples. All t-representation peaks found to be statistically significant by parametric calculations are confirmed by the permutation calculations. Permutation results do not indicate any new significant peaks that were not predicted by the parametric results. As such, permutation calculations are recommended to validate results obtained from parametric determinations of statistical significance.
{"title":"Comparison of parametric and permutation t spectral representations for determining individual metabolite abundances from factorial design spectra","authors":"Leonardo J. Duarte , Gustavo G. Marcheafave , Elis D. Pauli , Ieda S. Scarminio , Roy E. Bruns","doi":"10.1016/j.chemolab.2025.105622","DOIUrl":"10.1016/j.chemolab.2025.105622","url":null,"abstract":"<div><div>Two level factorial design spectra can be transformed into t spectral representations to analyze changes in metabolic abundances owing to environmental impacts. This transformation involves performing statistically paired <em>t</em>-test for each spectroscopic variable. These tests are sensitive to deviations from normality of the spectral data as well as heterogeneous variances of data at different factorial design levels. Although existing spectral information for metabolites can help guide interpretive efforts, permutation calculations can be performed to obtain statistical significance and t values of metabolic peaks that are expected to be less sensitive to these assumptions. The results of these calculations are reported here and compared with results from parametric statistical values for 13,501 NMR spectral variables for two level factorial design data of ethanol, dichloromethane and ethanol-dichloromethane (1:1) mixture extracts of yerba mate leaf samples. All t-representation peaks found to be statistically significant by parametric calculations are confirmed by the permutation calculations. Permutation results do not indicate any new significant peaks that were not predicted by the parametric results. As such, permutation calculations are recommended to validate results obtained from parametric determinations of statistical significance.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105622"},"PeriodicalIF":3.8,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145836727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-18DOI: 10.1016/j.chemolab.2025.105621
Rodrigo Canarin de Oliveira , Hector Hernan Hernandez Zarta , Wargner Alonso Moreno Losada , Sebastián Javier Caruso , Hágata Cremasco , Evandro Bona , Douglas N. Rutledge , Diego Galvan
Soybean is a major global commodity. Given its importance, ensuring traceability becomes essential. Genetic, climatic, and soil-related factors influence its chemical composition. Integrating multi-source data using a multiblock analysis represents a powerful approach to differentiating soybeans and monitoring their traceability. This study employed an extension of the ComDim method (also known as Common Components and Specific Weights Analysis, CCSWA) to simultaneously differentiate 20 Brazilian soybean varieties, conventional and transgenic, based on cultivation region and cultivation type. The extension replaced the PCA (Principal Components Analysis) used in classical ComDim by CCA (Common Components Analysis). Forty samples cultivated in Londrina and Ponta Grossa (Paraná, Brazil) were analyzed for their fatty acid, amino acid, isoflavone, and mineral profiles using GC-FID, IEC, HPLC-DAD, and ICP-OES. The CCA-based ComDim results revealed that Common Component 2 (CC2) was primarily responsible for distinguishing the geographical regions of Londrina and Ponta Grossa. The global loadings of CC2 indicated that zinc (Zn), manganese (Mn), oleic acid, arginine, and malonyl genistin were the most influential variables in this component. In contrast, CC3 was associated with differentiating conventional and transgenic cultivars. The global loadings highlighted linoleic acid, oleic acid, α-linolenic acid, malonyl glycitin, malonyl genistin, Fe, Zn, and Mn as the most relevant contributors. The combined CC2 and CC3 plots indicated tendencies toward differentiation of soybean samples by cultivation region and cultivation type. This result highlights the potential of CCA-based ComDim as an effective tool for soybean traceability.
大豆是一种主要的全球商品。鉴于其重要性,确保可追溯性变得至关重要。遗传、气候和与土壤有关的因素影响其化学成分。使用多块分析集成多源数据是区分大豆和监测其可追溯性的有力方法。本研究采用ComDim方法(也称为Common Components and Specific Weights Analysis, CCSWA)的扩展方法,根据种植区域和种植类型同时区分了20个巴西大豆品种,包括常规大豆和转基因大豆。该扩展用CCA(公共成分分析)取代了经典ComDim中使用的PCA(主成分分析)。采用GC-FID、IEC、HPLC-DAD和ICP-OES分析了巴西Londrina和Ponta Grossa (paran)种植的40个样品的脂肪酸、氨基酸、异黄酮和矿物质谱。基于CC2的ComDim结果表明,共同成分2 (Common Component 2, CC2)是区分Londrina和Ponta Grossa地理区域的主要原因。CC2的全球负荷表明,锌(Zn)、锰(Mn)、油酸、精氨酸和丙二醇基genistin是影响该组分的主要变量。相比之下,CC3与常规和转基因品种的分化有关。亚油酸、油酸、α-亚麻酸、丙二醇甘油酯、丙二醇龙胆素、铁、锌和锰是最相关的贡献者。CC2和CC3联合样地显示了大豆样品按栽培区域和栽培类型分化的趋势。这一结果突出了基于ccm的ComDim作为大豆可追溯性的有效工具的潜力。
{"title":"A multi-source data integration for soybean differentiation through multiblock data analysis using a novel adaptation of ComDim","authors":"Rodrigo Canarin de Oliveira , Hector Hernan Hernandez Zarta , Wargner Alonso Moreno Losada , Sebastián Javier Caruso , Hágata Cremasco , Evandro Bona , Douglas N. Rutledge , Diego Galvan","doi":"10.1016/j.chemolab.2025.105621","DOIUrl":"10.1016/j.chemolab.2025.105621","url":null,"abstract":"<div><div>Soybean is a major global commodity. Given its importance, ensuring traceability becomes essential. Genetic, climatic, and soil-related factors influence its chemical composition. Integrating multi-source data using a multiblock analysis represents a powerful approach to differentiating soybeans and monitoring their traceability. This study employed an extension of the ComDim method (also known as Common Components and Specific Weights Analysis, CCSWA) to simultaneously differentiate 20 Brazilian soybean varieties, conventional and transgenic, based on cultivation region and cultivation type. The extension replaced the PCA (Principal Components Analysis) used in classical ComDim by CCA (Common Components Analysis). Forty samples cultivated in Londrina and Ponta Grossa (Paraná, Brazil) were analyzed for their fatty acid, amino acid, isoflavone, and mineral profiles using GC-FID, IEC, HPLC-DAD, and ICP-OES. The CCA-based ComDim results revealed that Common Component 2 (CC2) was primarily responsible for distinguishing the geographical regions of Londrina and Ponta Grossa. The global loadings of CC2 indicated that zinc (Zn), manganese (Mn), oleic acid, arginine, and malonyl genistin were the most influential variables in this component. In contrast, CC3 was associated with differentiating conventional and transgenic cultivars. The global loadings highlighted linoleic acid, oleic acid, α-linolenic acid, malonyl glycitin, malonyl genistin, Fe, Zn, and Mn as the most relevant contributors. The combined CC2 and CC3 plots indicated tendencies toward differentiation of soybean samples by cultivation region and cultivation type. This result highlights the potential of CCA-based ComDim as an effective tool for soybean traceability.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105621"},"PeriodicalIF":3.8,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145837308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-18DOI: 10.1016/j.chemolab.2025.105619
Mohamed El-dosuky , Aboul Ella Hassanien , Heba Alshater , Rania Ahmed , Sameh H. Basha , Heba AboulElla , Ashraf Darwish , Sara Abdelghafar
This paper introduces a neutrosophic deep learning model for automated detection and classification of palladium nanoparticles in scanning electron microscopy (SEM) images, distinguishing between ordered and disordered structures for accurate nanoparticle characterization. The model follows a five-phase pipeline for enhanced accuracy and efficiency. It begins with data augmentation, applying transformations like rotation and flipping to improve dataset diversity. The second phase uses neutrosophic image segmentation to manage uncertainty and noise in SEM images, allowing for the precise isolation of nanoparticle regions. In the third phase, the VGG-19 deep neural network extracts high-level features, initially identifying 25,088 features. In the fourth phase, a hybrid approach combining Gini importance and Genetic Optimized Rough Sets (GORS) reduces the number of features to 2454. The refined feature set is then classified using a Random Forest classifier, which effectively distinguishes between ordered and disordered palladium nanoparticles. To validate its performance, the proposed model was evaluated on a dataset of 1000 SEM images of carbon-based materials with deposited palladium nanoparticles, which was then expanded to 1500 images to address class imbalance and minimize overfitting. The experimental results highlight the model's strong potential as a high-performance classification tool for nanoparticle analysis in SEM images, achieving an overall accuracy of 99.67 %. To evaluate the impact of the introduced phases on the proposed model's performance, four ablation experiments were conducted, demonstrating the significance of each phase. Dropping data augmentation and feature reduction reduced accuracy approximately to 97.5 %, while dropping the feature extraction phase reduced it further to 94.17 %, highlighting the critical impact of these processes on performance and robustness.
{"title":"Detecting and classifying palladium nanoparticles in microscopic images using neutrosophic deep learning","authors":"Mohamed El-dosuky , Aboul Ella Hassanien , Heba Alshater , Rania Ahmed , Sameh H. Basha , Heba AboulElla , Ashraf Darwish , Sara Abdelghafar","doi":"10.1016/j.chemolab.2025.105619","DOIUrl":"10.1016/j.chemolab.2025.105619","url":null,"abstract":"<div><div>This paper introduces a neutrosophic deep learning model for automated detection and classification of palladium nanoparticles in scanning electron microscopy (SEM) images, distinguishing between ordered and disordered structures for accurate nanoparticle characterization. The model follows a five-phase pipeline for enhanced accuracy and efficiency. It begins with data augmentation, applying transformations like rotation and flipping to improve dataset diversity. The second phase uses neutrosophic image segmentation to manage uncertainty and noise in SEM images, allowing for the precise isolation of nanoparticle regions. In the third phase, the VGG-19 deep neural network extracts high-level features, initially identifying 25,088 features. In the fourth phase, a hybrid approach combining Gini importance and Genetic Optimized Rough Sets (GORS) reduces the number of features to 2454. The refined feature set is then classified using a Random Forest classifier, which effectively distinguishes between ordered and disordered palladium nanoparticles. To validate its performance, the proposed model was evaluated on a dataset of 1000 SEM images of carbon-based materials with deposited palladium nanoparticles, which was then expanded to 1500 images to address class imbalance and minimize overfitting. The experimental results highlight the model's strong potential as a high-performance classification tool for nanoparticle analysis in SEM images, achieving an overall accuracy of 99.67 %. To evaluate the impact of the introduced phases on the proposed model's performance, four ablation experiments were conducted, demonstrating the significance of each phase. Dropping data augmentation and feature reduction reduced accuracy approximately to 97.5 %, while dropping the feature extraction phase reduced it further to 94.17 %, highlighting the critical impact of these processes on performance and robustness.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105619"},"PeriodicalIF":3.8,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145787028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-17DOI: 10.1016/j.chemolab.2025.105620
Meixuan Zhao, Pengcheng Gu, Yuwang Han
Photoacoustic spectroscopy (PAS) is a powerful technique for detecting trace gas mixtures, with applications spanning industrial safety, environmental monitoring, and energy systems. However, when it is applied to three crucial indicator gases methane (CH4), ethane (C2H6), and ethylene (C2H4), strong spectral overlaps introduce cross-interference that complicates accurate concentration retrieval. To address limitations in conventional chemometric and machine learning approaches—such as poor generalization across concentration ranges and vulnerability to interference—this study proposes a hybrid model integrating Support Vector Machine (SVM) classification with Chaotic Particle Swarm Optimization (CPSO)-enhanced Kernel Extreme Learning Machine (KELM). The workflow includes wavelet-based denoising, feature selection via Competitive Adaptive Reweighted Sampling (CARS), dynamic thresholding by SVM to partition samples into high- and low-concentration regimes, and the eventual regression analysis using KELM. The proposed approach significantly improves detection accuracy across a wide concentration range (0.5–500 ppm). Experimental results show that the SVM-CPSO-KELM model achieves an average prediction error of 5.44 %, with maximum error below 14.37 %.
{"title":"A hybrid SVM-CPSO-KELM model for the simultaneous detection of methane, ethane, and ethylene via photoacoustic spectroscopy","authors":"Meixuan Zhao, Pengcheng Gu, Yuwang Han","doi":"10.1016/j.chemolab.2025.105620","DOIUrl":"10.1016/j.chemolab.2025.105620","url":null,"abstract":"<div><div>Photoacoustic spectroscopy (PAS) is a powerful technique for detecting trace gas mixtures, with applications spanning industrial safety, environmental monitoring, and energy systems. However, when it is applied to three crucial indicator gases methane (CH<sub>4</sub>), ethane (C<sub>2</sub>H<sub>6</sub>), and ethylene (C<sub>2</sub>H<sub>4</sub>), strong spectral overlaps introduce cross-interference that complicates accurate concentration retrieval. To address limitations in conventional chemometric and machine learning approaches—such as poor generalization across concentration ranges and vulnerability to interference—this study proposes a hybrid model integrating Support Vector Machine (SVM) classification with Chaotic Particle Swarm Optimization (CPSO)-enhanced Kernel Extreme Learning Machine (KELM). The workflow includes wavelet-based denoising, feature selection via Competitive Adaptive Reweighted Sampling (CARS), dynamic thresholding by SVM to partition samples into high- and low-concentration regimes, and the eventual regression analysis using KELM. The proposed approach significantly improves detection accuracy across a wide concentration range (0.5–500 ppm). Experimental results show that the SVM-CPSO-KELM model achieves an average prediction error of 5.44 %, with maximum error below 14.37 %.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105620"},"PeriodicalIF":3.8,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145787027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}