Pub Date : 2025-12-15Epub Date: 2025-09-11DOI: 10.1016/j.chemolab.2025.105533
Shiyu Liu , Xuan Liu , Shutao Wang , Chunhai Hu , Lide Fang , Xiaoli Yan
Near-infrared (NIR) spectra inherently possess a large number of overlapping absorption feature variables, the quantity of which typically surpasses the available sample size to a notably greater extent. Variable selection is universally acknowledged as an effective strategy for mitigating the challenges associated with the curse of dimensionality in high-dimensional spectral datasets. In this study, a novel dual-stage variable selection scheme, termed JMIM-RFE, was presented for high-dimensional spectral data analysis by integrating recursive feature elimination (RFE) with maximum of the minimum-based joint mutual information (JMIM), implemented through support vector machine (SVM) classification. JMIM was first employed for static fast filtering of redundant and irrelevant variables, followed by RFE-based dynamic iterative refinement to shrink the variable space while retaining critical spectral features. To comprehensively assess the efficacy, validation experiments were meticulously carried out on three distinct high-dimensional NIR datasets, with particular attention directed towa
{"title":"Dual-stage variable selection: Integrating static filtering and dynamic refinement for high-dimensional NIR analysis","authors":"Shiyu Liu , Xuan Liu , Shutao Wang , Chunhai Hu , Lide Fang , Xiaoli Yan","doi":"10.1016/j.chemolab.2025.105533","DOIUrl":"10.1016/j.chemolab.2025.105533","url":null,"abstract":"<div><div>Near-infrared (NIR) spectra inherently possess a large number of overlapping absorption feature variables, the quantity of which typically surpasses the available sample size to a notably greater extent. Variable selection is universally acknowledged as an effective strategy for mitigating the challenges associated with the curse of dimensionality in high-dimensional spectral datasets. In this study, a novel dual-stage variable selection scheme, termed JMIM-RFE, was presented for high-dimensional spectral data analysis by integrating recursive feature elimination (RFE) with maximum of the minimum-based joint mutual information (JMIM), implemented through support vector machine (SVM) classification. JMIM was first employed for static fast filtering of redundant and irrelevant variables, followed by RFE-based dynamic iterative refinement to shrink the variable space while retaining critical spectral features. To comprehensively assess the efficacy, validation experiments were meticulously carried out on three distinct high-dimensional NIR datasets, with particular attention directed towa</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"267 ","pages":"Article 105533"},"PeriodicalIF":3.8,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145046226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-15Epub Date: 2025-08-30DOI: 10.1016/j.chemolab.2025.105520
Ajay L. Vishwakarma, Shruti O. Varma, M.R. Sonawane, Ajay Chaudhari
The impact of salinity on soil has become a major environmental challenge due to global warming and urbanization. The electrical properties of soil are intricately influenced by physicochemical properties, salinity levels, moisture content, and geological features of the land. This work aimed to evaluate the electrical and chemical properties of the agricultural, riparian zone, and near-seafront salt marsh soils using a PC-based automated microwave X-band bench method at frequency 9.55 GHz with ‘infinite sample’ technique. Also, Chemical properties such as pH, sodium absorption ratio (SAR), exchangeable sodium percentage (ESP), organic carbon (OC), phosphorous (P), potassium (K), micronutrients (Fe, Mn, Cu, and Zn), and physical properties such as porosity (PO), particle and bulk density (PD and BD) of soil samples were measured using laboratory method in triplicate. Furthermore, Hierarchical Cluster Analysis (HCA) and Principal Component Analysis (PCA) were employed to classify and differentiate samples based on their properties, providing insights into underlying patterns and groupings. To accurately estimate the dielectric constant and dielectric loss, we implemented Multiple Linear Regression (MLR) and an Artificial Neural Network (ANN) model using a feed-forward back propagation. To evaluate the performance and predictive accuracy of the developed models, statistical metrics such as Root Mean Square Error (RMSE) and the coefficient of determination (R2) were used. The R2 and RMSE values of the dielectric constant obtained by the ANN model with PO, BD, PD, P, OC, K, and ESP as entered variables were 0.99 and 9.23 × 10−04, and for dielectric loss, were 0.98 and 2.93 × 10−02, respectively. For MLR, the R2 value of the dielectric constant and dielectric loss was 0.88 and 0.80. SHAP (SHapley Additive exPlanations) analysis, combined with an ANN model, revealed that the DC is influenced by the Exchangeable Sodium Percentage (ESP), while DL minutely affected. Thus, ANN and SHAP accurately predicted dielectric properties of soil, offering a nondestructive and efficient approach for monitoring salinity effects on soil health.
{"title":"Implementation of artificial intelligence and multivariate analysis to analyze electrical and physicochemical properties of seawater-affected agriculture soil","authors":"Ajay L. Vishwakarma, Shruti O. Varma, M.R. Sonawane, Ajay Chaudhari","doi":"10.1016/j.chemolab.2025.105520","DOIUrl":"10.1016/j.chemolab.2025.105520","url":null,"abstract":"<div><div>The impact of salinity on soil has become a major environmental challenge due to global warming and urbanization. The electrical properties of soil are intricately influenced by physicochemical properties, salinity levels, moisture content, and geological features of the land. This work aimed to evaluate the electrical and chemical properties of the agricultural, riparian zone, and near-seafront salt marsh soils using a PC-based automated microwave X-band bench method at frequency 9.55 GHz with ‘infinite sample’ technique. Also, Chemical properties such as pH, sodium absorption ratio (SAR), exchangeable sodium percentage (ESP), organic carbon (OC), phosphorous (P), potassium (K), micronutrients (Fe, Mn, Cu, and Zn), and physical properties such as porosity (PO), particle and bulk density (PD and BD) of soil samples were measured using laboratory method in triplicate. Furthermore, Hierarchical Cluster Analysis (HCA) and Principal Component Analysis (PCA) were employed to classify and differentiate samples based on their properties, providing insights into underlying patterns and groupings. To accurately estimate the dielectric constant and dielectric loss, we implemented Multiple Linear Regression (MLR) and an Artificial Neural Network (ANN) model using a feed-forward back propagation. To evaluate the performance and predictive accuracy of the developed models, statistical metrics such as Root Mean Square Error (RMSE) and the coefficient of determination (R<sup>2</sup>) were used. The R<sup>2</sup> and RMSE values of the dielectric constant obtained by the ANN model with PO, BD, PD, P, OC, K, and ESP as entered variables were 0.99 and 9.23 × 10<sup>−04</sup>, and for dielectric loss, were 0.98 and 2.93 × 10<sup>−02</sup>, respectively. For MLR, the R<sup>2</sup> value of the dielectric constant and dielectric loss was 0.88 and 0.80. SHAP (SHapley Additive exPlanations) analysis, combined with an ANN model, revealed that the DC is influenced by the Exchangeable Sodium Percentage (ESP), while DL minutely affected. Thus, ANN and SHAP accurately predicted dielectric properties of soil, offering a nondestructive and efficient approach for monitoring salinity effects on soil health.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"267 ","pages":"Article 105520"},"PeriodicalIF":3.8,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144997328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-15Epub Date: 2025-09-19DOI: 10.1016/j.chemolab.2025.105537
Pooja Devi, Bhuvaneshvar Kumar
<div><div>This study explores the flow dynamics and thermal characteristics of a tetrahybrid nanofluid over a stretching cylinder, considering the effects of a magnetic field and internal heat generation. Two distinct tetrahybrid nanofluids are examined for the comparative analysis of temperature, pressure, velocity distributions, skin friction, and heat transfer performance: one composed of Ag+SiO<span><math><msub><mrow></mrow><mrow><mn>2</mn></mrow></msub></math></span>+TiO<span><math><msub><mrow></mrow><mrow><mn>2</mn></mrow></msub></math></span>+Al<span><math><msub><mrow></mrow><mrow><mn>2</mn></mrow></msub></math></span>O<span><math><msub><mrow></mrow><mrow><mn>3</mn></mrow></msub></math></span> suspended in kerosene oil, and the other consisting of Au+CuO+Fe<span><math><msub><mrow></mrow><mrow><mn>3</mn></mrow></msub></math></span>O<span><math><msub><mrow></mrow><mrow><mn>4</mn></mrow></msub></math></span>+ Multi-Walled Carbon Nanotubes (<span><math><mrow><mi>M</mi><mi>W</mi><mi>C</mi><mi>N</mi><mi>T</mi><mi>s</mi></mrow></math></span>) dispersed in water. The governing equations are solved numerically using the fourth-order Runge–Kutta method coupled with a shooting strategy and artificial neural network (ANN). Parametric studies revealed that the Au+ CuO+Fe<span><math><msub><mrow></mrow><mrow><mn>3</mn></mrow></msub></math></span>O<span><math><msub><mrow></mrow><mrow><mn>4</mn></mrow></msub></math></span>+Multi-Walled Carbon Nanotubes (<span><math><mrow><mi>M</mi><mi>W</mi><mi>C</mi><mi>N</mi><mi>T</mi><mi>s</mi></mrow></math></span>) nanofluid exhibited superior thermal performance, characterized by higher Nusselt numbers, while the Ag+SiO<span><math><msub><mrow></mrow><mrow><mn>2</mn></mrow></msub></math></span>+TiO<span><math><msub><mrow></mrow><mrow><mn>2</mn></mrow></msub></math></span>+Al<span><math><msub><mrow></mrow><mrow><mn>2</mn></mrow></msub></math></span>O<span><math><msub><mrow></mrow><mrow><mn>3</mn></mrow></msub></math></span> nanofluid provided enhanced momentum transport and higher velocity profiles. Au+CuO+Fe<span><math><msub><mrow></mrow><mrow><mn>3</mn></mrow></msub></math></span>O<span><math><msub><mrow></mrow><mrow><mn>4</mn></mrow></msub></math></span>+Multi-Walled Carbon Nanotubes (<span><math><mrow><mi>M</mi><mi>W</mi><mi>C</mi><mi>N</mi><mi>T</mi><mi>s</mi></mrow></math></span>) shows stronger pressure resistance near the surface, while Ag+SiO<span><math><msub><mrow></mrow><mrow><mn>2</mn></mrow></msub></math></span>+TiO<span><math><msub><mrow></mrow><mrow><mn>2</mn></mrow></msub></math></span>+Al<span><math><msub><mrow></mrow><mrow><mn>2</mn></mrow></msub></math></span>O<span><math><msub><mrow></mrow><mrow><mn>3</mn></mrow></msub></math></span> yields greater skin friction due to higher effective viscosity. An artificial neural network (ANN) was trained using Bayesian regularization to accurately predict skin friction and Nusselt number values. The Au+CuO+Fe<span><math><msub><mrow></mrow><mrow><mn>3</mn></mrow>
{"title":"Artificial neural network-assisted study on thermohydrodynamic behavior of tetrahybrid nanofluids in a porous stretching cylinder","authors":"Pooja Devi, Bhuvaneshvar Kumar","doi":"10.1016/j.chemolab.2025.105537","DOIUrl":"10.1016/j.chemolab.2025.105537","url":null,"abstract":"<div><div>This study explores the flow dynamics and thermal characteristics of a tetrahybrid nanofluid over a stretching cylinder, considering the effects of a magnetic field and internal heat generation. Two distinct tetrahybrid nanofluids are examined for the comparative analysis of temperature, pressure, velocity distributions, skin friction, and heat transfer performance: one composed of Ag+SiO<span><math><msub><mrow></mrow><mrow><mn>2</mn></mrow></msub></math></span>+TiO<span><math><msub><mrow></mrow><mrow><mn>2</mn></mrow></msub></math></span>+Al<span><math><msub><mrow></mrow><mrow><mn>2</mn></mrow></msub></math></span>O<span><math><msub><mrow></mrow><mrow><mn>3</mn></mrow></msub></math></span> suspended in kerosene oil, and the other consisting of Au+CuO+Fe<span><math><msub><mrow></mrow><mrow><mn>3</mn></mrow></msub></math></span>O<span><math><msub><mrow></mrow><mrow><mn>4</mn></mrow></msub></math></span>+ Multi-Walled Carbon Nanotubes (<span><math><mrow><mi>M</mi><mi>W</mi><mi>C</mi><mi>N</mi><mi>T</mi><mi>s</mi></mrow></math></span>) dispersed in water. The governing equations are solved numerically using the fourth-order Runge–Kutta method coupled with a shooting strategy and artificial neural network (ANN). Parametric studies revealed that the Au+ CuO+Fe<span><math><msub><mrow></mrow><mrow><mn>3</mn></mrow></msub></math></span>O<span><math><msub><mrow></mrow><mrow><mn>4</mn></mrow></msub></math></span>+Multi-Walled Carbon Nanotubes (<span><math><mrow><mi>M</mi><mi>W</mi><mi>C</mi><mi>N</mi><mi>T</mi><mi>s</mi></mrow></math></span>) nanofluid exhibited superior thermal performance, characterized by higher Nusselt numbers, while the Ag+SiO<span><math><msub><mrow></mrow><mrow><mn>2</mn></mrow></msub></math></span>+TiO<span><math><msub><mrow></mrow><mrow><mn>2</mn></mrow></msub></math></span>+Al<span><math><msub><mrow></mrow><mrow><mn>2</mn></mrow></msub></math></span>O<span><math><msub><mrow></mrow><mrow><mn>3</mn></mrow></msub></math></span> nanofluid provided enhanced momentum transport and higher velocity profiles. Au+CuO+Fe<span><math><msub><mrow></mrow><mrow><mn>3</mn></mrow></msub></math></span>O<span><math><msub><mrow></mrow><mrow><mn>4</mn></mrow></msub></math></span>+Multi-Walled Carbon Nanotubes (<span><math><mrow><mi>M</mi><mi>W</mi><mi>C</mi><mi>N</mi><mi>T</mi><mi>s</mi></mrow></math></span>) shows stronger pressure resistance near the surface, while Ag+SiO<span><math><msub><mrow></mrow><mrow><mn>2</mn></mrow></msub></math></span>+TiO<span><math><msub><mrow></mrow><mrow><mn>2</mn></mrow></msub></math></span>+Al<span><math><msub><mrow></mrow><mrow><mn>2</mn></mrow></msub></math></span>O<span><math><msub><mrow></mrow><mrow><mn>3</mn></mrow></msub></math></span> yields greater skin friction due to higher effective viscosity. An artificial neural network (ANN) was trained using Bayesian regularization to accurately predict skin friction and Nusselt number values. The Au+CuO+Fe<span><math><msub><mrow></mrow><mrow><mn>3</mn></mrow>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"267 ","pages":"Article 105537"},"PeriodicalIF":3.8,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145099579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-15Epub Date: 2025-10-11DOI: 10.1016/j.chemolab.2025.105550
Changrui Xie, Xi Chen
Industrial processes often exhibit multimode characteristics due to factors like load variations, equipment changes, and feedstock fluctuations. This paper introduces a Dirichlet Process-based Twofold-Robust Mixture Regression Model (DPR2MRM) for multimode processes. As a Bayesian nonparametric model, it automatically determines the number of mixture components from observed data using Dirichlet process mixture techniques, avoiding underfitting and overfitting. The model employs a Student's-t mixture model for input space learning, leveraging its long-tail properties for robust mode identification. For each mode, a regression model is built to capture the relationship between inputs and outputs, incorporating Student's-t noise to ensure robustness against output space outliers. The optimal posteriors of the model parameters are inferenced within a full Bayesian framework, and an analytical posterior predictive distribution is derived. The effectiveness of the DPR2MRM is demonstrated through a numerical example and two industrial applications.
{"title":"Robust soft sensor development based on Dirichlet process mixture of regression model for multimode processes","authors":"Changrui Xie, Xi Chen","doi":"10.1016/j.chemolab.2025.105550","DOIUrl":"10.1016/j.chemolab.2025.105550","url":null,"abstract":"<div><div>Industrial processes often exhibit multimode characteristics due to factors like load variations, equipment changes, and feedstock fluctuations. This paper introduces a Dirichlet Process-based Twofold-Robust Mixture Regression Model (DPR<sup>2</sup>MRM) for multimode processes. As a Bayesian nonparametric model, it automatically determines the number of mixture components from observed data using Dirichlet process mixture techniques, avoiding underfitting and overfitting. The model employs a Student's-<em>t</em> mixture model for input space learning, leveraging its long-tail properties for robust mode identification. For each mode, a regression model is built to capture the relationship between inputs and outputs, incorporating Student's-<em>t</em> noise to ensure robustness against output space outliers. The optimal posteriors of the model parameters are inferenced within a full Bayesian framework, and an analytical posterior predictive distribution is derived. The effectiveness of the DPR<sup>2</sup>MRM is demonstrated through a numerical example and two industrial applications.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"267 ","pages":"Article 105550"},"PeriodicalIF":3.8,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145325250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Acute Coronary Syndrome (ACS) is a prevalent cardiovascular disease characterized by high incidence and mortality rates. Numerous studies have focused on utilizing artificial intelligence and machine learning algorithms to assess and predict the risk of ACS in patients. However, due to the sensitivity and privacy of medical data, training machine learning models on a centralized server that aggregates ACS data from various institutions poses certain risks. For the first time, this study validates the effectiveness of utilizing federated learning to collaboratively analyze medical data for predicting ACS. A federated learning-based ACS prediction model, i.e., FedLG, which incorporates local–global collaboration for mutual correction, is presented accordingly. On the client side, a regularization term is added to the loss function to reduce deviations caused by heterogeneous data, helping the global model remain accurate and representative. On the server side, gradient normalization is applied to balance contributions from clients with different update frequencies, resulting in a more stable and reliable global model. Comprehensive experiments on the ACS dataset from a tertiary hospital in China show that FedLG consistently outperforms models trained on individual clients, as well as three other federated baselines, across seven evaluation metrics under both IID and non-IID settings. Temporal hold-out validation further indicates that FedLG maintains better generalizability than other baselines. In addition, analysis of feature importance shows that FedLG identifies lipid-related biomarkers, which aligns with clinical knowledge, enhancing the interpretability of the results. The source code of FedLG is freely available at https://github.com/bioinformatics-xu/FedLG.
{"title":"Federated learning with local–global collaboration for predicting acute coronary syndrome","authors":"Yonggong Ren , Jia Shang , Meiwei Zhang , Xiaolu Xu , Zhaohong Geng","doi":"10.1016/j.chemolab.2025.105515","DOIUrl":"10.1016/j.chemolab.2025.105515","url":null,"abstract":"<div><div>Acute Coronary Syndrome (ACS) is a prevalent cardiovascular disease characterized by high incidence and mortality rates. Numerous studies have focused on utilizing artificial intelligence and machine learning algorithms to assess and predict the risk of ACS in patients. However, due to the sensitivity and privacy of medical data, training machine learning models on a centralized server that aggregates ACS data from various institutions poses certain risks. For the first time, this study validates the effectiveness of utilizing federated learning to collaboratively analyze medical data for predicting ACS. A federated learning-based ACS prediction model, i.e., FedLG, which incorporates local–global collaboration for mutual correction, is presented accordingly. On the client side, a regularization term is added to the loss function to reduce deviations caused by heterogeneous data, helping the global model remain accurate and representative. On the server side, gradient normalization is applied to balance contributions from clients with different update frequencies, resulting in a more stable and reliable global model. Comprehensive experiments on the ACS dataset from a tertiary hospital in China show that FedLG consistently outperforms models trained on individual clients, as well as three other federated baselines, across seven evaluation metrics under both IID and non-IID settings. Temporal hold-out validation further indicates that FedLG maintains better generalizability than other baselines. In addition, analysis of feature importance shows that FedLG identifies lipid-related biomarkers, which aligns with clinical knowledge, enhancing the interpretability of the results. The source code of FedLG is freely available at <span><span>https://github.com/bioinformatics-xu/FedLG</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"267 ","pages":"Article 105515"},"PeriodicalIF":3.8,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145119320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-15Epub Date: 2025-09-10DOI: 10.1016/j.chemolab.2025.105531
Emil Rynkeby Kristensen , Jonas Dornonville de la Cour , Tobias Warburg , René Lynge Eriksen , Bjarke Jørgensen , James Emil Avery , Mogens Hinge
Inline industrial application of high-speed hyperspectral imaging for real-time chemometric analysis presents a computationally difficult problem due to the complexity of the analysis and the large amount of spectral data that needs to be processed in real-time. The image resolution and acquisition rate of modern sensors, as well as increased ambition for detail and accuracy, makes it a challenge to design computational methods and to implement them sufficiently efficient to complete within the few milliseconds available between frames. Real-time chemometrics including intensity calibration, Savitzky-Golay filtering, principal component analysis, and support vector machine classification for plastic identification was performed directly on a hyperspectral camera. Three processing scenarios were evaluated: a Python-based CPU implementation, a C++ CPU implementation, and a GPU implementation using OpenCL. The performance was assessed in terms of total processing time per image. The results demonstrate that GPU-based processing increased frame rate to 160 fps compared to 35 fps and 94 fps achieved with CPU-based processing. Analysis shows that the speed of the GPU based processing is limited by the image acquisition rate of the sensor. The GPU processing has excess computational capacity which enables integration of more complex classification models or parallel execution of multiple models with different purposes. Removing the data processing as the limiting factor of performance, increases the industrial relevance of hyperspectral imaging systems.
{"title":"High-speed processing of hyperspectral images for enabling demanding industrial applications","authors":"Emil Rynkeby Kristensen , Jonas Dornonville de la Cour , Tobias Warburg , René Lynge Eriksen , Bjarke Jørgensen , James Emil Avery , Mogens Hinge","doi":"10.1016/j.chemolab.2025.105531","DOIUrl":"10.1016/j.chemolab.2025.105531","url":null,"abstract":"<div><div>Inline industrial application of high-speed hyperspectral imaging for real-time chemometric analysis presents a computationally difficult problem due to the complexity of the analysis and the large amount of spectral data that needs to be processed in real-time. The image resolution and acquisition rate of modern sensors, as well as increased ambition for detail and accuracy, makes it a challenge to design computational methods and to implement them sufficiently efficient to complete within the few milliseconds available between frames. Real-time chemometrics including intensity calibration, Savitzky-Golay filtering, principal component analysis, and support vector machine classification for plastic identification was performed directly on a hyperspectral camera. Three processing scenarios were evaluated: a Python-based CPU implementation, a C++ CPU implementation, and a GPU implementation using OpenCL. The performance was assessed in terms of total processing time per image. The results demonstrate that GPU-based processing increased frame rate to 160 fps compared to 35 fps and 94 fps achieved with CPU-based processing. Analysis shows that the speed of the GPU based processing is limited by the image acquisition rate of the sensor. The GPU processing has excess computational capacity which enables integration of more complex classification models or parallel execution of multiple models with different purposes. Removing the data processing as the limiting factor of performance, increases the industrial relevance of hyperspectral imaging systems.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"267 ","pages":"Article 105531"},"PeriodicalIF":3.8,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145060694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-15Epub Date: 2025-09-12DOI: 10.1016/j.chemolab.2025.105534
Youpeng Fan , Yongchun Fang
In recent years, the combination of vibration spectral data and data-driven methods has dominated the development and application of close spectral recognition. Nevertheless, in practical applications, open spectral categories (i.e., novel/unknown spectral categories) may be encountered, as collecting comprehend-sive categories is time-consuming and requires professional expertise. The intuitive solution is to obscure features of different categories, but relevant exploratory experiments yield unsatisfactory open-set performance, which may be attributed to sparse spectral features and high inter-class similarity. To remedy this issue, we innovatively propose an end-to-end scheme combining Multiple Features Fusion and Mixup with Conditional Decoder (MFFMCD) in this paper. In particular, to enhance feature representation, MFFMCD adopts two auxiliary feature extraction modules and fuses different branch features. Additionally, to cope with high inter-class similarity, the enhanced features are obscured within a mini-batch and restored to corresponding class samples through a conditional decoder to mimic the feature distribution of unknown classes. Experiments on three publicly available spectral datasets show that the proposed MFFMCD significantly outperforms existing methods. In the end, extensive ablation studies are conducted to investigate the effectiveness, correctness, and robustness of our proposal.
{"title":"Multiple features fusion and mixup with conditional decoder for","authors":"Youpeng Fan , Yongchun Fang","doi":"10.1016/j.chemolab.2025.105534","DOIUrl":"10.1016/j.chemolab.2025.105534","url":null,"abstract":"<div><div>In recent years, the combination of vibration spectral data and data-driven methods has dominated the development and application of close spectral recognition. Nevertheless, in practical applications, open spectral categories (i.e., novel/unknown spectral categories) may be encountered, as collecting comprehend-sive categories is time-consuming and requires professional expertise. The intuitive solution is to obscure features of different categories, but relevant exploratory experiments yield unsatisfactory open-set performance, which may be attributed to sparse spectral features and high inter-class similarity. To remedy this issue, we innovatively propose an end-to-end scheme combining <strong>M</strong>ultiple <strong>F</strong>eatures <strong>F</strong>usion and <strong>M</strong>ixup with <strong>C</strong>onditional <strong>D</strong>ecoder (MFFMCD) in this paper. In particular, to enhance feature representation, MFFMCD adopts two auxiliary feature extraction modules and fuses different branch features. Additionally, to cope with high inter-class similarity, the enhanced features are obscured within a mini-batch and restored to corresponding class samples through a conditional decoder to mimic the feature distribution of unknown classes. Experiments on three publicly available spectral datasets show that the proposed MFFMCD significantly outperforms existing methods. In the end, extensive ablation studies are conducted to investigate the effectiveness, correctness, and robustness of our proposal.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"267 ","pages":"Article 105534"},"PeriodicalIF":3.8,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145060693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-15Epub Date: 2025-09-06DOI: 10.1016/j.chemolab.2025.105522
Canfeng Liu , Binhui Wang , Hui Dong , Yihan Pan , Jiawen Lin , Jintian Yang , Yihui Tao , Hao Sun
The contemporary landscape of medical diagnostics and therapeutic interventions has witnessed a remarkable surge in the production of time series data. Artificial intelligence (AI), particularly the deep learning, has presented promising values in investigating the high-dimension and meaningful significance hidden behind these diagnostic data. In this work, we propose a novel analytics for intelligent nucleic acid amplification tests (NAAT) based on deep learning and paper microfluidics. On-chip amplification data were straightforwardly fed to a deep learning model derived from Transformer neural network. To facilitate the development and deployment of the approach, we conducted a lightweight processing of the Transformer model. Then, the capacity of the model for accurately predicting the reaction trend and end-point value was validated. We also employed ablation experiments to evaluate the effects of various parameters on prediction performance followed by optimizing the model. Then, three clinical datasets including 706 positive and 205 negative samples obtained from Fujian Provincial Hospital were used to verify the generalization of the approach. Without any modification of the model structure and hyperparameters, accuracy, sensitivity, and specificity by the presented approach were 98.28 %, 97.52 % and 99.02 %. Further comparison studies based on the nine different AI algorithms including recurrent neural network and long-short term memory were performed. The presented study holds potential to facilitating routine diagnostic tasks for preventing pandemic and propelling the development of smart portable instruments.
{"title":"Time series analysis of nucleic acid reactions via a generalized transformer model","authors":"Canfeng Liu , Binhui Wang , Hui Dong , Yihan Pan , Jiawen Lin , Jintian Yang , Yihui Tao , Hao Sun","doi":"10.1016/j.chemolab.2025.105522","DOIUrl":"10.1016/j.chemolab.2025.105522","url":null,"abstract":"<div><div>The contemporary landscape of medical diagnostics and therapeutic interventions has witnessed a remarkable surge in the production of time series data. Artificial intelligence (AI), particularly the deep learning, has presented promising values in investigating the high-dimension and meaningful significance hidden behind these diagnostic data. In this work, we propose a novel analytics for intelligent nucleic acid amplification tests (NAAT) based on deep learning and paper microfluidics. On-chip amplification data were straightforwardly fed to a deep learning model derived from Transformer neural network. To facilitate the development and deployment of the approach, we conducted a lightweight processing of the Transformer model. Then, the capacity of the model for accurately predicting the reaction trend and end-point value was validated. We also employed ablation experiments to evaluate the effects of various parameters on prediction performance followed by optimizing the model. Then, three clinical datasets including 706 positive and 205 negative samples obtained from Fujian Provincial Hospital were used to verify the generalization of the approach. Without any modification of the model structure and hyperparameters, accuracy, sensitivity, and specificity by the presented approach were 98.28 %, 97.52 % and 99.02 %. Further comparison studies based on the nine different AI algorithms including recurrent neural network and long-short term memory were performed. The presented study holds potential to facilitating routine diagnostic tasks for preventing pandemic and propelling the development of smart portable instruments.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"267 ","pages":"Article 105522"},"PeriodicalIF":3.8,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145046222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-15Epub Date: 2025-09-04DOI: 10.1016/j.chemolab.2025.105519
Xue-Yu Zhang , Qun-Xiong Zhu , Ming-Jia Liu , Feng Ma , Yi Luo , Wei Ke , Yan-Lin He , Ming-Qing Zhang , Yuan Xu
Given the challenges of low variability in industrial processes, which intensify data scarcity and produce anomalous distributions that compromise data-driven model accuracy. Existing sample generation methods often overlook key factors such as sparsity and correlation among data. To address these challenges, this paper proposes a StyleGAN-based virtual sample generation method with an embedded self-attention mechanism (SASG-VSG). Firstly, StyleGAN is used to map the original data space to a disentangled latent space. The output variables then act as control conditions, guiding the model to interpolate along the output dimension to ensure a more uniform distribution of generated samples. Besides, a self-attention module is incorporated into the discriminator to enhance its ability to capture the similarity between the virtual samples and the original data distribution. Finally, validation experiments on a purified terephthalic acid (PTA) solvent system and a sulfur recovery unit (SRU) confirm the capability of the proposed SASG-VSG in generating high-quality virtual samples for soft-sensing applications.
{"title":"Self-attention embedded StyleGAN for virtual sample generation in sensing applications","authors":"Xue-Yu Zhang , Qun-Xiong Zhu , Ming-Jia Liu , Feng Ma , Yi Luo , Wei Ke , Yan-Lin He , Ming-Qing Zhang , Yuan Xu","doi":"10.1016/j.chemolab.2025.105519","DOIUrl":"10.1016/j.chemolab.2025.105519","url":null,"abstract":"<div><div>Given the challenges of low variability in industrial processes, which intensify data scarcity and produce anomalous distributions that compromise data-driven model accuracy. Existing sample generation methods often overlook key factors such as sparsity and correlation among data. To address these challenges, this paper proposes a StyleGAN-based virtual sample generation method with an embedded self-attention mechanism (SASG-VSG). Firstly, StyleGAN is used to map the original data space to a disentangled latent space. The output variables then act as control conditions, guiding the model to interpolate along the output dimension to ensure a more uniform distribution of generated samples. Besides, a self-attention module is incorporated into the discriminator to enhance its ability to capture the similarity between the virtual samples and the original data distribution. Finally, validation experiments on a purified terephthalic acid (PTA) solvent system and a sulfur recovery unit (SRU) confirm the capability of the proposed SASG-VSG in generating high-quality virtual samples for soft-sensing applications.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"267 ","pages":"Article 105519"},"PeriodicalIF":3.8,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145046223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-15Epub Date: 2025-09-20DOI: 10.1016/j.chemolab.2025.105535
Xiaoqing Zheng, Bo Peng, Anke Xue, Ming Ge, Yaguang Kong, Aipeng Jiang
In modern industry, soft sensors provide real-time predictions of quality variables that are difficult to measure directly with physical sensors. However, in industrial processes, changes in material properties, catalyst deactivation, and other factors often lead to shifts in data distribution. Existing soft sensor models often overlook the impact of these distribution changes on performance. To address the issue of performance degradation due to changes in data distribution, this paper proposes a self-attention based Difference Long Short-Term Memory (SA-DLSTM) network for soft sensor modeling. By employing self-attention, industrial raw data is refined to facilitate the extraction of nonlinear features, thereby reducing the difficulty in modeling. A Difference Channel is designed to perform correlation analysis and select significant features from the raw data, followed by extracting the difference information that can reveal changes in the data distribution. The SA-DLSTM soft sensor model is established and validated on two benchmark industrial datasets: Debutanizer Column and Sulfur Recovery Unit. Comparisons with benchmark models, and state-of-the-art models show that SA-DLSTM achieves the best performance across all evaluation metrics, demonstrating the effectiveness of the proposed model.
{"title":"Self-attention based Difference Long Short-Term Memory Network for Industrial Data-driven Modeling","authors":"Xiaoqing Zheng, Bo Peng, Anke Xue, Ming Ge, Yaguang Kong, Aipeng Jiang","doi":"10.1016/j.chemolab.2025.105535","DOIUrl":"10.1016/j.chemolab.2025.105535","url":null,"abstract":"<div><div>In modern industry, soft sensors provide real-time predictions of quality variables that are difficult to measure directly with physical sensors. However, in industrial processes, changes in material properties, catalyst deactivation, and other factors often lead to shifts in data distribution. Existing soft sensor models often overlook the impact of these distribution changes on performance. To address the issue of performance degradation due to changes in data distribution, this paper proposes a self-attention based Difference Long Short-Term Memory (SA-DLSTM) network for soft sensor modeling. By employing self-attention, industrial raw data is refined to facilitate the extraction of nonlinear features, thereby reducing the difficulty in modeling. A Difference Channel is designed to perform correlation analysis and select significant features from the raw data, followed by extracting the difference information that can reveal changes in the data distribution. The SA-DLSTM soft sensor model is established and validated on two benchmark industrial datasets: Debutanizer Column and Sulfur Recovery Unit. Comparisons with benchmark models, and state-of-the-art models show that SA-DLSTM achieves the best performance across all evaluation metrics, demonstrating the effectiveness of the proposed model.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"267 ","pages":"Article 105535"},"PeriodicalIF":3.8,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145109706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}