Pub Date : 2026-02-15Epub Date: 2025-12-17DOI: 10.1016/j.chemolab.2025.105617
Mohammed Saif Ismail Hameed, Robin van der Haar, Ying Chen, Peter Goos
When experimental tests differ in cost and the experiment is constrained by a fixed total budget, the optimal number of tests and the allocation between expensive and inexpensive tests cannot be determined a priori. We propose using a Variable Neighborhood Search (VNS) algorithm to generate optimal experimental designs for such problems. VNS is an intuitive and flexible metaheuristic that has been successfully applied to a wide range of optimization problems. We illustrate the effectiveness of the VNS algorithm by generating improved designs for a micronization experiment.
{"title":"Optimal design of experiments when not every test is equally expensive","authors":"Mohammed Saif Ismail Hameed, Robin van der Haar, Ying Chen, Peter Goos","doi":"10.1016/j.chemolab.2025.105617","DOIUrl":"10.1016/j.chemolab.2025.105617","url":null,"abstract":"<div><div>When experimental tests differ in cost and the experiment is constrained by a fixed total budget, the optimal number of tests and the allocation between expensive and inexpensive tests cannot be determined a priori. We propose using a Variable Neighborhood Search (VNS) algorithm to generate optimal experimental designs for such problems. VNS is an intuitive and flexible metaheuristic that has been successfully applied to a wide range of optimization problems. We illustrate the effectiveness of the VNS algorithm by generating improved designs for a micronization experiment.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105617"},"PeriodicalIF":3.8,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145836845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-15Epub Date: 2025-12-12DOI: 10.1016/j.chemolab.2025.105615
Yinran Xiong , Jie Tang , Guangming Qiu , Peng Wang , Yuncan Chen , Jing Jing , Lijun Zhu
Agricultural products often exhibit substantial batch-to-batch variability in their chemical and physical properties due to environmental and other uncontrollable factors, making robust quality monitoring essential for ensuring product consistency and stability. Near-infrared (NIR) spectroscopy offers rich chemical and physical information for qualitative quality assessment, but its high dimensionality and the scarcity of abnormal samples, since non-conforming products are not intentionally manufactured, limit the applicability of conventional supervised learning approaches. To address these challenges, this study proposes Covariance-Shrunk Slow Feature Analysis (CSSFA), a novel unsupervised learning method that integrates covariance shrinkage into the Slow Feature Analysis (SFA) framework. CSSFA mitigates estimation bias in high-dimensional settings and improves the robustness and interpretability of extracted features. Experiments on two NIR tobacco datasets demonstrate that CSSFA effectively captures features related to product quality stability and achieves accurate anomaly detection without requiring large numbers of abnormal samples. This work provides a scalable and interpretable strategy for anomaly detection of agricultural products using NIR spectroscopy with abnormal samples which are rare or unavailable.
{"title":"An unsupervised approach to anomaly detection in near-infrared spectroscopy via Covariance-Shrunk Slow Feature Analysis","authors":"Yinran Xiong , Jie Tang , Guangming Qiu , Peng Wang , Yuncan Chen , Jing Jing , Lijun Zhu","doi":"10.1016/j.chemolab.2025.105615","DOIUrl":"10.1016/j.chemolab.2025.105615","url":null,"abstract":"<div><div>Agricultural products often exhibit substantial batch-to-batch variability in their chemical and physical properties due to environmental and other uncontrollable factors, making robust quality monitoring essential for ensuring product consistency and stability. Near-infrared (NIR) spectroscopy offers rich chemical and physical information for qualitative quality assessment, but its high dimensionality and the scarcity of abnormal samples, since non-conforming products are not intentionally manufactured, limit the applicability of conventional supervised learning approaches. To address these challenges, this study proposes Covariance-Shrunk Slow Feature Analysis (CSSFA), a novel unsupervised learning method that integrates covariance shrinkage into the Slow Feature Analysis (SFA) framework. CSSFA mitigates estimation bias in high-dimensional settings and improves the robustness and interpretability of extracted features. Experiments on two NIR tobacco datasets demonstrate that CSSFA effectively captures features related to product quality stability and achieves accurate anomaly detection without requiring large numbers of abnormal samples. This work provides a scalable and interpretable strategy for anomaly detection of agricultural products using NIR spectroscopy with abnormal samples which are rare or unavailable.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105615"},"PeriodicalIF":3.8,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145787024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-15Epub Date: 2025-12-12DOI: 10.1016/j.chemolab.2025.105616
Luiz Renato Rosa Leme de Souza, Carlos Alberto Rios, Márcia Cristina Breitkreitz
Context:
Python is a widely-known open-source and robust programming language used in many research fields. Artificial Intelligence (AI) is a growing tool with many applications that is capable of helping with long and difficult tasks. Routines for preprocessing spectra signals and applying chemometric models are usually part of expensive software. Despite the existence of isolated code snippets, libraries, and tutorials, it is a hard task to find an open-access routine that guides from the raw Raman mapping data set to the clear chemical information contained within the analyzed samples by means of chemical maps.
Objectives:
This paper presents an AI-assisted Python-based routine for preprocessing Raman mapping results and generating chemical maps of samples using the chemometric methods: CLS, PLS and PCA, with the goal of providing an open access routine for research purposes.
Methods:
Python programming language and AI tools were used as code generators, translators, and debugging tools to assist the creation of the routine, and the results were compared to the ones obtained by a Matlab routine.
Results:
The Python routine successfully performed the preprocessing of the Raman spectra and the calculations of the chemometric methods CLS, PLS and PCA generating chemical maps. The results were equivalent to those of Matlab for the same data set, leading to the same conclusions.
Conclusion:
This paper demonstrated the application of an open access Python-based AI-guided routine to preprocess and generate chemical maps applying CLS, PCA and PLS models, now available and editable to suit different needs.
{"title":"Raman mapping and Chemometrics: An open access Python-based routine to preprocess and generate chemical maps applying CLS, PCA and PLS methods","authors":"Luiz Renato Rosa Leme de Souza, Carlos Alberto Rios, Márcia Cristina Breitkreitz","doi":"10.1016/j.chemolab.2025.105616","DOIUrl":"10.1016/j.chemolab.2025.105616","url":null,"abstract":"<div><h3>Context:</h3><div>Python is a widely-known open-source and robust programming language used in many research fields. Artificial Intelligence (AI) is a growing tool with many applications that is capable of helping with long and difficult tasks. Routines for preprocessing spectra signals and applying chemometric models are usually part of expensive software. Despite the existence of isolated code snippets, libraries, and tutorials, it is a hard task to find an open-access routine that guides from the raw Raman mapping data set to the clear chemical information contained within the analyzed samples by means of chemical maps.</div></div><div><h3>Objectives:</h3><div>This paper presents an AI-assisted Python-based routine for preprocessing Raman mapping results and generating chemical maps of samples using the chemometric methods: CLS, PLS and PCA, with the goal of providing an open access routine for research purposes.</div></div><div><h3>Methods:</h3><div>Python programming language and AI tools were used as code generators, translators, and debugging tools to assist the creation of the routine, and the results were compared to the ones obtained by a Matlab routine.</div></div><div><h3>Results:</h3><div>The Python routine successfully performed the preprocessing of the Raman spectra and the calculations of the chemometric methods CLS, PLS and PCA generating chemical maps. The results were equivalent to those of Matlab for the same data set, leading to the same conclusions.</div></div><div><h3>Conclusion:</h3><div>This paper demonstrated the application of an open access Python-based AI-guided routine to preprocess and generate chemical maps applying CLS, PCA and PLS models, now available and editable to suit different needs.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105616"},"PeriodicalIF":3.8,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145787025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-15Epub Date: 2025-12-17DOI: 10.1016/j.chemolab.2025.105614
Haoran Li , Xin Zhang , Pengchegn Wu , Yang Zhang , Jiyong Shi , Xiaobo Zou
Advances in spectral techniques have generated high-resolution data with thousands of variables. Although an increasing number of variables provides more comprehensive molecular information, it also brings more challenges for existing chemometrics methods, such as the risk of over-fitting and the lack of interpretability. Therefore, we propose a hybrid variable selection approach specifically designed for large-scale datasets. First, considering the continuous characteristics of spectral variables and their importance, interval partial least squares (iPLS) and variable combination population analysis (VCPA) were applied to select relevant variables while reducing the variable space. Second, we consider that truly relevant variables exhibit consistent importance across the sample domain for the same analytical tasks and are therefore more likely to be selected and retained. Consequently, a cross-domain constrained ensemble (CCE) strategy is developed using the least absolute shrinkage and selection operator (LASSO) to further enhance the performance of variable selection. Experiments on wine H NMR and pork Raman spectroscopy datasets demonstrate that the proposed method improves prediction performance in terms of RMSEP and RPD. In addition, the proposed CCE method demonstrates superior prediction improvement performance over other final selection methods. These results confirm the effectiveness of both the hybrid variable selection framework and the CCE strategy in handling large-scale spectral datasets.
{"title":"A hybrid variable selection with cross-domain constrained ensemble (CCE) for large-scale spectroscopic data","authors":"Haoran Li , Xin Zhang , Pengchegn Wu , Yang Zhang , Jiyong Shi , Xiaobo Zou","doi":"10.1016/j.chemolab.2025.105614","DOIUrl":"10.1016/j.chemolab.2025.105614","url":null,"abstract":"<div><div>Advances in spectral techniques have generated high-resolution data with thousands of variables. Although an increasing number of variables provides more comprehensive molecular information, it also brings more challenges for existing chemometrics methods, such as the risk of over-fitting and the lack of interpretability. Therefore, we propose a hybrid variable selection approach specifically designed for large-scale datasets. First, considering the continuous characteristics of spectral variables and their importance, interval partial least squares (iPLS) and variable combination population analysis (VCPA) were applied to select relevant variables while reducing the variable space. Second, we consider that truly relevant variables exhibit consistent importance across the sample domain for the same analytical tasks and are therefore more likely to be selected and retained. Consequently, a cross-domain constrained ensemble (CCE) strategy is developed using the least absolute shrinkage and selection operator (LASSO) to further enhance the performance of variable selection. Experiments on wine <span><math><msup><mrow></mrow><mrow><mn>1</mn></mrow></msup></math></span>H NMR and pork Raman spectroscopy datasets demonstrate that the proposed method improves prediction performance in terms of RMSEP and RPD. In addition, the proposed CCE method demonstrates superior prediction improvement performance over other final selection methods. These results confirm the effectiveness of both the hybrid variable selection framework and the CCE strategy in handling large-scale spectral datasets.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105614"},"PeriodicalIF":3.8,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145787026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-15Epub Date: 2025-12-08DOI: 10.1016/j.chemolab.2025.105606
Sergej Papoci, Manuel Jiménez, Michele Ghidotti, María Beatriz de la Calle Guntiñas
The willingness of consumers to pay higher prices for high quality specialties, such as Darjeeling tea, goes hand in hand with an increase of fraudulent practices in which Darjeeling tea is substituted totally or partially by cheaper teas. Currently, to evaluate the percentage of substitution that a method can detect, Darjeeling tea is mixed in different proportions with non-Darjeeling teas, and after homogenisation the mixture is analysed. This time-consuming approach implies the use of valuable amounts of sample and, therefore an alternative approach is needed. Here a method is described to calculate the minimum detectable substitution percentage of Darjeeling tea by other teas without needing to prepare real mixtures. The approach is based on the use of virtual mixtures made with the results obtained for commercially available Darjeeling and non-Darjeeling teas. The method used for authentication purposes, made use of the elemental profiles of tea obtained by Energy Dispersive X-ray Fluorescence, combined with chemometrics and modelling by Partial Least Square-Discriminant Analysis. The false positives percentage at different substitution levels, was evaluated and compared with the results obtained with real mixtures of Darjeeling and non-Darjeeling teas. Comparable results were obtained with both approaches. Twenty percent was the lowest substitution level that could be detected with an acceptable sensitivity (94 %) and specificity (86 %). A fast, easy to implement approach has been developed and validated, to calculate the minimum substitution percentage that can be detected by an authentication analytical method, without the need to carry out additional laboratory experiments.
{"title":"Evaluation of a mathematical approach to detect fraudulent substitution of Darjeeling tea with other types of tea using the elemental profiles obtained by Energy Dispersive X-ray Fluorescence","authors":"Sergej Papoci, Manuel Jiménez, Michele Ghidotti, María Beatriz de la Calle Guntiñas","doi":"10.1016/j.chemolab.2025.105606","DOIUrl":"10.1016/j.chemolab.2025.105606","url":null,"abstract":"<div><div>The willingness of consumers to pay higher prices for high quality specialties, such as Darjeeling tea, goes hand in hand with an increase of fraudulent practices in which Darjeeling tea is substituted totally or partially by cheaper teas. Currently, to evaluate the percentage of substitution that a method can detect, Darjeeling tea is mixed in different proportions with non-Darjeeling teas, and after homogenisation the mixture is analysed. This time-consuming approach implies the use of valuable amounts of sample and, therefore an alternative approach is needed. Here a method is described to calculate the minimum detectable substitution percentage of Darjeeling tea by other teas without needing to prepare real mixtures. The approach is based on the use of virtual mixtures made with the results obtained for commercially available Darjeeling and non-Darjeeling teas. The method used for authentication purposes, made use of the elemental profiles of tea obtained by Energy Dispersive X-ray Fluorescence, combined with chemometrics and modelling by Partial Least Square-Discriminant Analysis. The false positives percentage at different substitution levels, was evaluated and compared with the results obtained with real mixtures of Darjeeling and non-Darjeeling teas. Comparable results were obtained with both approaches. Twenty percent was the lowest substitution level that could be detected with an acceptable sensitivity (94 %) and specificity (86 %). A fast, easy to implement approach has been developed and validated, to calculate the minimum substitution percentage that can be detected by an authentication analytical method, without the need to carry out additional laboratory experiments.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105606"},"PeriodicalIF":3.8,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145733576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-15Epub Date: 2025-12-01DOI: 10.1016/j.chemolab.2025.105601
Maoyuan Zhou , Jingjie He , Xingyu Liu , Junmin Huang , Jirui Zhang , Jiaxing Li , Xiaorui Huang , Qianjin Guo
Accurately assessing drug-target interaction (DTA) strength is pivotal in drug development. Enhancing DTA prediction precision necessitates effective protein representation methods. This study introduces MAGTSF-DTA, a multi-modal feature fusion semantic framework leveraging Mamba and graph convolutional networks (GCN). For molecules, atomic-level graph structures are generated from SMILES sequences, and the Mamba module is integrated with GCN to achieve efficient semantic learning. Furthermore, protein-protein interaction (PPI) networks are incorporated, and hierarchical approaches (HMANet & LMANet) are designed to integrate diverse protein features, enriching protein semantic representations. Experiments demonstrate that the proposed model significantly improves prediction accuracy on benchmark datasets compared to state-of-the-art techniques, validating the effectiveness of the Mamba architecture in DTA prediction and showcasing the model's generalization and interpretability.
{"title":"A semantic framework for drug-target affinity prediction using Mamba and graph convolutional networks for multimodal feature fusion","authors":"Maoyuan Zhou , Jingjie He , Xingyu Liu , Junmin Huang , Jirui Zhang , Jiaxing Li , Xiaorui Huang , Qianjin Guo","doi":"10.1016/j.chemolab.2025.105601","DOIUrl":"10.1016/j.chemolab.2025.105601","url":null,"abstract":"<div><div>Accurately assessing drug-target interaction (DTA) strength is pivotal in drug development. Enhancing DTA prediction precision necessitates effective protein representation methods. This study introduces MAGTSF-DTA, a multi-modal feature fusion semantic framework leveraging Mamba and graph convolutional networks (GCN). For molecules, atomic-level graph structures are generated from SMILES sequences, and the Mamba module is integrated with GCN to achieve efficient semantic learning. Furthermore, protein-protein interaction (PPI) networks are incorporated, and hierarchical approaches (HMANet & LMANet) are designed to integrate diverse protein features, enriching protein semantic representations. Experiments demonstrate that the proposed model significantly improves prediction accuracy on benchmark datasets compared to state-of-the-art techniques, validating the effectiveness of the Mamba architecture in DTA prediction and showcasing the model's generalization and interpretability.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105601"},"PeriodicalIF":3.8,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145682820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-15Epub Date: 2026-01-06DOI: 10.1016/j.chemolab.2026.105628
Shuxian Su , Peixian Geng , Jiashan Yang , Gansu Zhang , Gehang Xue , Liang Dong , Zhaolin Lu
Ash content is a key quality index in coal preparation. To overcome the time-consuming, labor-intensive nature of offline assays, the interference sensitivity of online measurements, and the surface-only information of vision/spectroscopy, this study proposes a multimodal deep-learning framework for fast and accurate raw-coal ash prediction. First, a CT-based data acquisition system was designed to synchronously collect CT image slices of raw-coal particles and particle-density information. Through quantitative analysis of the association between density and ash content, the decisive role of density – as a key indicator of coal properties – in ash prediction is revealed. By further mining the frequency-domain and spatial features of coal CT images, and based on these findings, a multimodal approach is formulated that fuses CT features with density. The model adopts a three-branch architecture: an improved EfficientNet-B0 branch learns spatial cues, an AshFormer branch captures frequency patterns related to mineral distribution and microstructural discontinuities, and a multilayer perceptron encodes density. Cross-modal attention achieves deep fusion and complementarity across modalities, and a KAN-based regression head outputs ash content. On industrial data, the proposed method attains MAPE = 0.0468, RMSE = 0.0573, and , outperforming single-modality image models (, , percentage points). These results demonstrate the advantage of multimodal fusion in improving the accuracy and generalization of coal-quality analysis and provide a new approach for rapid analysis under small-sample conditions.
{"title":"Multimodal fusion of CT features and density for rapid prediction of raw-coal ash","authors":"Shuxian Su , Peixian Geng , Jiashan Yang , Gansu Zhang , Gehang Xue , Liang Dong , Zhaolin Lu","doi":"10.1016/j.chemolab.2026.105628","DOIUrl":"10.1016/j.chemolab.2026.105628","url":null,"abstract":"<div><div>Ash content is a key quality index in coal preparation. To overcome the time-consuming, labor-intensive nature of offline assays, the interference sensitivity of online measurements, and the surface-only information of vision/spectroscopy, this study proposes a multimodal deep-learning framework for fast and accurate raw-coal ash prediction. First, a CT-based data acquisition system was designed to synchronously collect CT image slices of raw-coal particles and particle-density information. Through quantitative analysis of the association between density and ash content, the decisive role of density – as a key indicator of coal properties – in ash prediction is revealed. By further mining the frequency-domain and spatial features of coal CT images, and based on these findings, a multimodal approach is formulated that fuses CT features with density. The model adopts a three-branch architecture: an improved EfficientNet-B0 branch learns spatial cues, an AshFormer branch captures frequency patterns related to mineral distribution and microstructural discontinuities, and a multilayer perceptron encodes density. Cross-modal attention achieves deep fusion and complementarity across modalities, and a KAN-based regression head outputs ash content. On industrial data, the proposed method attains MAPE = 0.0468, RMSE = 0.0573, and <span><math><mrow><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>=</mo><mn>97</mn><mo>.</mo><mn>0</mn><mtext>%</mtext></mrow></math></span>, outperforming single-modality image models (<span><math><mrow><mi>Δ</mi><mi>M</mi><mi>A</mi><mi>P</mi><mi>E</mi><mo>=</mo><mo>−</mo><mn>0</mn><mo>.</mo><mn>0096</mn></mrow></math></span>, <span><math><mrow><mi>Δ</mi><mi>R</mi><mi>M</mi><mi>S</mi><mi>E</mi><mo>=</mo><mo>−</mo><mn>0</mn><mo>.</mo><mn>0413</mn></mrow></math></span>, <span><math><mrow><mi>Δ</mi><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>=</mo><mo>+</mo><mn>4</mn><mo>.</mo><mn>48</mn></mrow></math></span> percentage points). These results demonstrate the advantage of multimodal fusion in improving the accuracy and generalization of coal-quality analysis and provide a new approach for rapid analysis under small-sample conditions.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105628"},"PeriodicalIF":3.8,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145920782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-15Epub Date: 2026-01-09DOI: 10.1016/j.chemolab.2026.105636
Bingtao Zhao , Lu Ding , Yaxin Su
Accurate modeling of the thermophysical properties of CO2 in the region around the critical point (RACP) or at the near-critical state is crucial for performance assessment and process design in CO2 capture, utilization, and storage. Despite its importance, significant challenges persist because of sharp fluctuations in these properties induced by critical effects, which directly influence flow behavior, interfacial tension, and process performance. To address this issue, we develop a Bayesian regularized neural network (BRNN)-based robust framework to predict the density, viscosity, and thermal conductivity of CO2 within the RACP. Initial steps involve constructing a backpropagation neural network facilitated by the Kennard-Stone algorithm for data partitioning. A comprehensive analysis is conducted to evaluate the impacts of training algorithms, the number of neurons in the hidden layer, and optimization methods on network performance. By refining the training procedure and optimizing weights and thresholds using the genetic algorithm, we ultimately propose a more accurate model named GA-BRNN. This model demonstrates superior generalization capabilities when compared to traditional correlations and other machine learning models for the prediction of CO2 properties in the RACP, yielding the mean squared error of 2.0484 × 10−4 (R2 = 0.9635) for density, 1.8680 (R2 = 0.9743) for viscosity, and 5.2196 (R2 = 0.9900) for thermal conductivity. The findings may provide a positive reference for modeling the thermophysical properties of near-critical-state CO2 applied in the processes related to carbon capture, utilization, and storage.
{"title":"A Bayesian network-based robust framework for determining density, viscosity, and thermal conductivity of near-critical-state CO2 applied in CCUS","authors":"Bingtao Zhao , Lu Ding , Yaxin Su","doi":"10.1016/j.chemolab.2026.105636","DOIUrl":"10.1016/j.chemolab.2026.105636","url":null,"abstract":"<div><div>Accurate modeling of the thermophysical properties of CO<sub>2</sub> in the region around the critical point (RACP) or at the near-critical state is crucial for performance assessment and process design in CO<sub>2</sub> capture, utilization, and storage. Despite its importance, significant challenges persist because of sharp fluctuations in these properties induced by critical effects, which directly influence flow behavior, interfacial tension, and process performance. To address this issue, we develop a Bayesian regularized neural network (BRNN)-based robust framework to predict the density, viscosity, and thermal conductivity of CO<sub>2</sub> within the RACP. Initial steps involve constructing a backpropagation neural network facilitated by the Kennard-Stone algorithm for data partitioning. A comprehensive analysis is conducted to evaluate the impacts of training algorithms, the number of neurons in the hidden layer, and optimization methods on network performance. By refining the training procedure and optimizing weights and thresholds using the genetic algorithm, we ultimately propose a more accurate model named GA-BRNN. This model demonstrates superior generalization capabilities when compared to traditional correlations and other machine learning models for the prediction of CO<sub>2</sub> properties in the RACP, yielding the mean squared error of 2.0484 × 10<sup>−4</sup> (R<sup>2</sup> = 0.9635) for density, 1.8680 (R<sup>2</sup> = 0.9743) for viscosity, and 5.2196 (R<sup>2</sup> = 0.9900) for thermal conductivity. The findings may provide a positive reference for modeling the thermophysical properties of near-critical-state CO<sub>2</sub> applied in the processes related to carbon capture, utilization, and storage.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105636"},"PeriodicalIF":3.8,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145972955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-15Epub Date: 2025-12-30DOI: 10.1016/j.chemolab.2025.105627
Zichuan Bu , Jihong Liu , Jiageng Zhang , Chi Liu , Yihua Liu , Kaili Ren , Xuewen Yan , Wei Gao , Jun Dong
Raman spectroscopy is a pivotal tool in analytical and physical chemistry, yet its application in complex systems is hindered by spectral superposition and analysis challenges. The development of deep learning technology has provided new ideas for the component analysis of complex mixtures. This study proposes a mixture component identification method named MCI, which is based on the masked autoencoder and convolutional neural network. The aim is to effectively solve the problems of qualitative recognition and quantitative analysis in the Raman spectra of mixtures. The MCI method adopts a multi-stage framework: First, the Voigt function is used to accurately extract the characteristic peaks of the mixture. Second, the MAE model is employed to reconstruct the corresponding pure-substance spectra. Then, the CNN model is combined to conduct qualitative and quantitative analyses on the reconstructed spectra. Finally, the spectrum of the remaining components is obtained by subtracting the reconstructed spectrum from the mixture spectrum. By iterating the above process, the step-by-step unmixing of complex mixtures is achieved. In the generated mixed sample test data, the MCI outperforms the other three comparative models in terms of complete recognition accuracy in qualitative analysis and the evaluation indicators of each substance, while maintaining a lower average concentration error in quantitative analysis. Moreover, for complex mixtures containing interfering substances, the MCI shows strong anti-interference ability and maintains a high Identification accuracy. In the actual measurement of mixed sample Raman spectral identification detection, The MCI model achieved an average accuracy and F1_Score of 97 % in all test samples, further verifying its reliability and practicality in detecting the main components of real and complex mixtures. In summary, this study provides a new technical method for Raman spectral analysis of complex mixtures, which holds certain theoretical significance and practical value.
{"title":"Deep learning-driven components analysis of Raman spectral mixtures: An integrated masked autoencoder with convolutional neural network approach","authors":"Zichuan Bu , Jihong Liu , Jiageng Zhang , Chi Liu , Yihua Liu , Kaili Ren , Xuewen Yan , Wei Gao , Jun Dong","doi":"10.1016/j.chemolab.2025.105627","DOIUrl":"10.1016/j.chemolab.2025.105627","url":null,"abstract":"<div><div>Raman spectroscopy is a pivotal tool in analytical and physical chemistry, yet its application in complex systems is hindered by spectral superposition and analysis challenges. The development of deep learning technology has provided new ideas for the component analysis of complex mixtures. This study proposes a mixture component identification method named MCI, which is based on the masked autoencoder and convolutional neural network. The aim is to effectively solve the problems of qualitative recognition and quantitative analysis in the Raman spectra of mixtures. The MCI method adopts a multi-stage framework: First, the Voigt function is used to accurately extract the characteristic peaks of the mixture. Second, the MAE model is employed to reconstruct the corresponding pure-substance spectra. Then, the CNN model is combined to conduct qualitative and quantitative analyses on the reconstructed spectra. Finally, the spectrum of the remaining components is obtained by subtracting the reconstructed spectrum from the mixture spectrum. By iterating the above process, the step-by-step unmixing of complex mixtures is achieved. In the generated mixed sample test data, the MCI outperforms the other three comparative models in terms of complete recognition accuracy in qualitative analysis and the evaluation indicators of each substance, while maintaining a lower average concentration error in quantitative analysis. Moreover, for complex mixtures containing interfering substances, the MCI shows strong anti-interference ability and maintains a high Identification accuracy. In the actual measurement of mixed sample Raman spectral identification detection, The MCI model achieved an average accuracy and <em>F1_Score</em> of 97 % in all test samples, further verifying its reliability and practicality in detecting the main components of real and complex mixtures. In summary, this study provides a new technical method for Raman spectral analysis of complex mixtures, which holds certain theoretical significance and practical value.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105627"},"PeriodicalIF":3.8,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145880468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-15Epub Date: 2025-11-19DOI: 10.1016/j.chemolab.2025.105586
Ander Bastida Urkiza , Eneko Lopez , Renata Matekalo , Andreas Seifert
Photonic techniques combined with chemometrics offer promising opportunities for next-generation medical in vitro diagnostics. In this work, we evaluate the ability of vibrational spectroscopy to distinguish viral respiratory infections that have progressed to pneumonia. Pneumonia remains a major global health burden and is currently the eighth leading cause of death worldwide. Reliable and differentiated diagnoses are still challenging, as existing methods are time-consuming and require specialized laboratories and expertise.
We present a rapid, machine-learning–based in vitro approach for classifying influenza A and seasonal flu and for discriminating between these pathogenic strains. To enhance diagnostic performance, we employ data fusion of complementary Raman and Fourier-transform infrared absorption spectra acquired from microliter-scale droplets of human blood plasma.
By integrating spectral information from both modalities, the models capture a broader range of physiological changes and more comprehensively reflect the biochemical profile of the samples, leading to more robust classification. Using generalized linear models, we achieve accuracies of up to 95% in distinguishing healthy controls from influenza A– and seasonal flu–infected samples. The results further highlight specific scenarios in which data fusion yields measurable improvements in predictive power.
{"title":"Boosting pneumonia diagnosis with machine learning and spectroscopic data fusion techniques","authors":"Ander Bastida Urkiza , Eneko Lopez , Renata Matekalo , Andreas Seifert","doi":"10.1016/j.chemolab.2025.105586","DOIUrl":"10.1016/j.chemolab.2025.105586","url":null,"abstract":"<div><div>Photonic techniques combined with chemometrics offer promising opportunities for next-generation medical in vitro diagnostics. In this work, we evaluate the ability of vibrational spectroscopy to distinguish viral respiratory infections that have progressed to pneumonia. Pneumonia remains a major global health burden and is currently the eighth leading cause of death worldwide. Reliable and differentiated diagnoses are still challenging, as existing methods are time-consuming and require specialized laboratories and expertise.</div><div>We present a rapid, machine-learning–based in vitro approach for classifying influenza A and seasonal flu and for discriminating between these pathogenic strains. To enhance diagnostic performance, we employ data fusion of complementary Raman and Fourier-transform infrared absorption spectra acquired from microliter-scale droplets of human blood plasma.</div><div>By integrating spectral information from both modalities, the models capture a broader range of physiological changes and more comprehensively reflect the biochemical profile of the samples, leading to more robust classification. Using generalized linear models, we achieve accuracies of up to 95% in distinguishing healthy controls from influenza A– and seasonal flu–infected samples. The results further highlight specific scenarios in which data fusion yields measurable improvements in predictive power.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"268 ","pages":"Article 105586"},"PeriodicalIF":3.8,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145569651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}