Pub Date : 2026-03-15Epub Date: 2026-01-13DOI: 10.1016/j.chemolab.2026.105633
Yalin Wang , Ruikai Yang , Chenliang Liu , Zhongmei Li , Yijing Fang , Weihua Gui
Accurate online prediction of key quality variables is a guiding indicator for process optimization and stable operation in industrial processes. Due to the continuous and occasionally abrupt nature of industrial processes, industrial data often exhibit complex spatiotemporal coupling characteristics across long-range spatial and adjacent temporal dimensions. In particular, the dynamic variation of local spatiotemporal neighborhood space makes it challenging for traditional methods to capture these patterns. To address this issue, this paper proposes a novel neighborhood attention-aware spatiotemporal manifold autoencoder (NA-STMAE) model for soft sensor modeling of quality variables, which is designed to learn adaptive correlations within spatial and temporal neighborhoods of industrial data. Specifically, a novel attention-based neighborhood computing mode is designed to dynamically allocate weights among local samples, enabling adaptive perception and refinement of neighborhood relationships. Based on this, an attention-aware spatiotemporal neighborhood feature extraction module is developed to learn local spatiotemporal dependencies, thereby enhancing the predictive performance of the proposed soft sensor model. Finally, extensive experiments were conducted on two industrial processes to validate the effectiveness of the proposed model. Experimental results demonstrate that the proposed model outperforms several mainstream soft sensor models in prediction tasks. Moreover, ablation experiments further confirm the critical role of dynamic weight allocation in capturing both temporal and spatial dimensions.
{"title":"Eyes on every node: Adaptive neighborhood perception for spatiotemporal data intelligent modeling and its industrial application","authors":"Yalin Wang , Ruikai Yang , Chenliang Liu , Zhongmei Li , Yijing Fang , Weihua Gui","doi":"10.1016/j.chemolab.2026.105633","DOIUrl":"10.1016/j.chemolab.2026.105633","url":null,"abstract":"<div><div>Accurate online prediction of key quality variables is a guiding indicator for process optimization and stable operation in industrial processes. Due to the continuous and occasionally abrupt nature of industrial processes, industrial data often exhibit complex spatiotemporal coupling characteristics across long-range spatial and adjacent temporal dimensions. In particular, the dynamic variation of local spatiotemporal neighborhood space makes it challenging for traditional methods to capture these patterns. To address this issue, this paper proposes a novel neighborhood attention-aware spatiotemporal manifold autoencoder (NA-STMAE) model for soft sensor modeling of quality variables, which is designed to learn adaptive correlations within spatial and temporal neighborhoods of industrial data. Specifically, a novel attention-based neighborhood computing mode is designed to dynamically allocate weights among local samples, enabling adaptive perception and refinement of neighborhood relationships. Based on this, an attention-aware spatiotemporal neighborhood feature extraction module is developed to learn local spatiotemporal dependencies, thereby enhancing the predictive performance of the proposed soft sensor model. Finally, extensive experiments were conducted on two industrial processes to validate the effectiveness of the proposed model. Experimental results demonstrate that the proposed model outperforms several mainstream soft sensor models in prediction tasks. Moreover, ablation experiments further confirm the critical role of dynamic weight allocation in capturing both temporal and spatial dimensions.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"270 ","pages":"Article 105633"},"PeriodicalIF":3.8,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-15Epub Date: 2026-02-04DOI: 10.1016/j.chemolab.2026.105658
Federico N. Castañeda, Clara Parzanese, Mario R. Reta, Cecilia B. Castells, Juan Aspromonte, Rocío B. Pellegrino Vidal
Coffee is one of the world's most consumed beverages and the second most traded commodity. Natural roasted coffee, produced by heating green beans to develop its characteristic flavor, is highly appreciated. A variant, torrefacto coffee, incorporates sugar during roasting. While legitimate, this practice can be used to mask off-flavors from lower-quality beans and artificially increase weight, sometimes leading to mislabeling. This study proposes a simple, robust method to authenticate roasted coffee using UV absorbance spectroscopy and a one-class modeling algorithm (Data Driven-Soft Independent Modelling of Class Analogies). Samples of natural, torrefacto, and in-lab adulterated coffee (10%, 25%, and 50%), underwent a water extraction and were diluted for absorbance measurements (200 to 400 nm). A discriminant model was built using the first four principal components, which explained 99.7% of the spectral variance. Trained on 80% of the natural coffee samples, the model was validated on a test set containing the remaining natural, torrefacto, and adulterated samples. The method proved highly effective, detecting adulteration levels as low as 10%. It achieved 100% sensitivity, 97% specificity, and 97% overall accuracy. A White Analytical Chemistry assessment yielded an 86.2% whiteness score, confirming a strong balance between sustainability and analytical performance.
{"title":"A sustainable and straightforward approach for the authentication of roasted coffee samples based on absorption spectrophotometry coupled with Data Driven-Soft Independent Modelling of Class Analogy","authors":"Federico N. Castañeda, Clara Parzanese, Mario R. Reta, Cecilia B. Castells, Juan Aspromonte, Rocío B. Pellegrino Vidal","doi":"10.1016/j.chemolab.2026.105658","DOIUrl":"10.1016/j.chemolab.2026.105658","url":null,"abstract":"<div><div>Coffee is one of the world's most consumed beverages and the second most traded commodity. Natural roasted coffee, produced by heating green beans to develop its characteristic flavor, is highly appreciated. A variant, torrefacto coffee, incorporates sugar during roasting. While legitimate, this practice can be used to mask off-flavors from lower-quality beans and artificially increase weight, sometimes leading to mislabeling. This study proposes a simple, robust method to authenticate roasted coffee using UV absorbance spectroscopy and a one-class modeling algorithm (Data Driven-Soft Independent Modelling of Class Analogies). Samples of natural, torrefacto, and in-lab adulterated coffee (10%, 25%, and 50%), underwent a water extraction and were diluted for absorbance measurements (200 to 400 nm). A discriminant model was built using the first four principal components, which explained 99.7% of the spectral variance. Trained on 80% of the natural coffee samples, the model was validated on a test set containing the remaining natural, torrefacto, and adulterated samples. The method proved highly effective, detecting adulteration levels as low as 10%. It achieved 100% sensitivity, 97% specificity, and 97% overall accuracy. A White Analytical Chemistry assessment yielded an 86.2% whiteness score, confirming a strong balance between sustainability and analytical performance.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"270 ","pages":"Article 105658"},"PeriodicalIF":3.8,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146169972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-15Epub Date: 2026-01-13DOI: 10.1016/j.chemolab.2026.105637
Suraj R. Chaudhari , Atul A. Shirkhedkar
Cilnidipine (CIL) and chlorthalidone (CHL), both antihypertensive agents are approved for combined regimens for the management of hypertension. A thorough evaluation of their intrinsic stability and content levels in commercially available preparations and biological samples requires a simple and reliable analytical approach. Herein, computational approaches, stability investigations, and retrospective analysis of content uniformity results were explored to support the robustness and long-term suitability of the proposed protocol for routine application. Therefore, this experiment established an ultra-fluid liquid chromatography with diode array detection (UFLC-PDA) for simultaneous separation and quantification of CIL and CHL in Cilacar C, Nexovas CH tablets, and biological matrices. The applicability of the established protocol was confirmed with the ICH Q2 (R2), Q1A (R2), and Q1B recommendations. Analytes were extracted in a simple single step and analyzed using a rapid resolution ZORBAX Eclipse C18 column (4.6 mm internal diameter 100 mm length with 3.5 μm particle size), maintained at 33 as column oven temperature. The resolution was observed using binary gradient elution at 0.5 mL/min with a solvent system comprising H2O: ACN (25.85:74.15 % v/v). CHL and CIL were detected at a retention time (tR) of 2.221 ± 0.003 min and 4.435 ± 0.011 min, with a total run time <8.0 min. The proposed protocol demonstrates outstanding specificity and sensitivity, offering a systematic platform for developing and refining knowledge related to a UFLC-PDA procedure. Moreover, it shows a comprehensive understanding of the procedure to meet the requirements specified in ICH Q14.
{"title":"Robust integrated chemometric driven approach for the analysis of cilnidipine and chlorthalidone in biological and pharmaceutical matrices","authors":"Suraj R. Chaudhari , Atul A. Shirkhedkar","doi":"10.1016/j.chemolab.2026.105637","DOIUrl":"10.1016/j.chemolab.2026.105637","url":null,"abstract":"<div><div>Cilnidipine (CIL) and chlorthalidone (CHL), both antihypertensive agents are approved for combined regimens for the management of hypertension. A thorough evaluation of their intrinsic stability and content levels in commercially available preparations and biological samples requires a simple and reliable analytical approach. Herein, computational approaches, stability investigations, and retrospective analysis of content uniformity results were explored to support the robustness and long-term suitability of the proposed protocol for routine application. Therefore, this experiment established an ultra-fluid liquid chromatography with diode array detection (UFLC-PDA) for simultaneous separation and quantification of CIL and CHL in Cilacar C, Nexovas CH tablets, and biological matrices. The applicability of the established protocol was confirmed with the ICH Q2 (R2), Q1A (R2), and Q1B recommendations. Analytes were extracted in a simple single step and analyzed using a rapid resolution ZORBAX Eclipse C<sub>18</sub> column (4.6 mm internal diameter <span><math><mrow><mo>×</mo></mrow></math></span> 100 mm length with 3.5 μm particle size), maintained at 33 <span><math><mrow><mo>°C</mo></mrow></math></span> as column oven temperature. The resolution was observed using binary gradient elution at 0.5 mL/min with a solvent system comprising H<sub>2</sub>O: ACN (25.85:74.15 % <em>v/v</em>). CHL and CIL were detected at a retention time (t<sub>R</sub>) of 2.221 ± 0.003 min and 4.435 ± 0.011 min, with a total run time <8.0 min. The proposed protocol demonstrates outstanding specificity and sensitivity, offering a systematic platform for developing and refining knowledge related to a UFLC-PDA procedure. Moreover, it shows a comprehensive understanding of the procedure to meet the requirements specified in ICH Q14.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"270 ","pages":"Article 105637"},"PeriodicalIF":3.8,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146024966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-15Epub Date: 2026-01-23DOI: 10.1016/j.chemolab.2026.105648
Naif Almusallam , Maqsood Hayat
The biological functions of bacteria are significantly impacted by bacteriophage virion proteins (BVPs), which are bacterial viruses. BVPs play a major role in phage therapy and genetic engineering. Secure and accurate identification of these proteins is essential for understanding phage-host interactions and for bioinformatics and medical applications. However, ensuring privacy and robustness in computational models is challenging, especially when handling complex biological data. Previous works relied on wet-lab experiments, had limited scalability, incomplete feature coverage, and low generalization ability. In this study, we introduce a privacy-preserving and adversarial-robust deep learning framework. It integrates natural language processing (NLP) descriptors with transformer-guided ideal proximity matrix reconstruction to capture rich information from protein sequences. For post-hoc interpretability, we use SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME). These techniques increase openness and confidence in predictions. SHAP analyzes the dataset to identify the most significant proximity-based and NLP-derived descriptors at global and class levels. LIME provides instance-specific explanations, emphasizing local decision boundaries for particular predictions. The proposed model achieved 95.75 % and 90.27 % accuracy on the training and independent datasets, respectively. We calculated statistical measures, such as Chi-Square and P-value, for each dataset to demonstrate reliability. Our model improves predictive outcomes, transparency, and security. The empirical results validate its outstanding performance compared to existing models, while preserving security and explainable AI. This makes it suitable and reliable for real-world applications in proteomics and bioinformatics.
{"title":"Explainable AI for secure and accurate prediction of bacteriophage virion proteins using NLP descriptors and transformer-guided ideal proximity matrix reconstruction","authors":"Naif Almusallam , Maqsood Hayat","doi":"10.1016/j.chemolab.2026.105648","DOIUrl":"10.1016/j.chemolab.2026.105648","url":null,"abstract":"<div><div>The biological functions of bacteria are significantly impacted by bacteriophage virion proteins (BVPs), which are bacterial viruses. BVPs play a major role in phage therapy and genetic engineering. Secure and accurate identification of these proteins is essential for understanding phage-host interactions and for bioinformatics and medical applications. However, ensuring privacy and robustness in computational models is challenging, especially when handling complex biological data. Previous works relied on wet-lab experiments, had limited scalability, incomplete feature coverage, and low generalization ability. In this study, we introduce a privacy-preserving and adversarial-robust deep learning framework. It integrates natural language processing (NLP) descriptors with transformer-guided ideal proximity matrix reconstruction to capture rich information from protein sequences. For post-hoc interpretability, we use SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME). These techniques increase openness and confidence in predictions. SHAP analyzes the dataset to identify the most significant proximity-based and NLP-derived descriptors at global and class levels. LIME provides instance-specific explanations, emphasizing local decision boundaries for particular predictions. The proposed model achieved 95.75 % and 90.27 % accuracy on the training and independent datasets, respectively. We calculated statistical measures, such as Chi-Square and P-value, for each dataset to demonstrate reliability. Our model improves predictive outcomes, transparency, and security. The empirical results validate its outstanding performance compared to existing models, while preserving security and explainable AI. This makes it suitable and reliable for real-world applications in proteomics and bioinformatics.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"270 ","pages":"Article 105648"},"PeriodicalIF":3.8,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146074875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-15Epub Date: 2026-01-13DOI: 10.1016/j.chemolab.2026.105638
Wenxue Han , Ziteng Zuo , Xiangjing Zhang , Lan Zhang , Weiming Shao
Precisely predicting quality variables is crucial for advanced process control and real-time optimization in continuous chemical processes. Soft sensing technology is utilized for this task due to its advantages of real-time capability and low cost. Dynamical probabilistic latent variable regression (DPLVR) models for soft sensing modeling have attracted increasing attention, owing to their superior feature extraction capability. Nevertheless, the DPLVR-based soft sensing methods only account for variable correlations while neglecting the underlying causal mechanisms. Current research on causal methods primarily focuses on the selection of causal variables and the construction of causal graphs, failing to effectively integrate the causal priors that reflect the underlying mechanisms of chemical processes. In addition, outliers in chemical data further degrade the prediction accuracy of soft sensors, making them inadequate for practical production requirements. Given the above problems, a novel mechanistic causality-guided robust DPLVR (MCR-DPLVR) model is proposed for predicting the quality variables. In the MCR-DPLVR, the mechanistic causality knowledge is used to identify the causal mechanisms among different types of variables, and the Student’s distribution is utilized to enhance the model’s robustness against outliers. Subsequently, an efficient semi-supervised training algorithm is developed to train the MCR-DPLVR based on the expectation–maximization algorithm. Furthermore, the effectiveness of the MCR-DPLVR is verified by a synthetic numerical case and an actual hydrogen production process, which exhibits the superiority of the MCR-DPLVR in comparison to several cutting-edge methods.
{"title":"A mechanistic causality-guided robust dynamical probabilistic latent variable regression model and its application to soft sensing of continuous chemical processes","authors":"Wenxue Han , Ziteng Zuo , Xiangjing Zhang , Lan Zhang , Weiming Shao","doi":"10.1016/j.chemolab.2026.105638","DOIUrl":"10.1016/j.chemolab.2026.105638","url":null,"abstract":"<div><div>Precisely predicting quality variables is crucial for advanced process control and real-time optimization in continuous chemical processes. Soft sensing technology is utilized for this task due to its advantages of real-time capability and low cost. Dynamical probabilistic latent variable regression (DPLVR) models for soft sensing modeling have attracted increasing attention, owing to their superior feature extraction capability. Nevertheless, the DPLVR-based soft sensing methods only account for variable correlations while neglecting the underlying causal mechanisms. Current research on causal methods primarily focuses on the selection of causal variables and the construction of causal graphs, failing to effectively integrate the causal priors that reflect the underlying mechanisms of chemical processes. In addition, outliers in chemical data further degrade the prediction accuracy of soft sensors, making them inadequate for practical production requirements. Given the above problems, a novel mechanistic causality-guided robust DPLVR (MCR-DPLVR) model is proposed for predicting the quality variables. In the MCR-DPLVR, the mechanistic causality knowledge is used to identify the causal mechanisms among different types of variables, and the Student’s <span><math><mi>t</mi></math></span> distribution is utilized to enhance the model’s robustness against outliers. Subsequently, an efficient semi-supervised training algorithm is developed to train the MCR-DPLVR based on the expectation–maximization algorithm. Furthermore, the effectiveness of the MCR-DPLVR is verified by a synthetic numerical case and an actual hydrogen production process, which exhibits the superiority of the MCR-DPLVR in comparison to several cutting-edge methods.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"270 ","pages":"Article 105638"},"PeriodicalIF":3.8,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nowadays, the large number of measurable variables has considerably increased the complexity of data. In the framework of the decision-making process, this leads to the need of adequate tools to set priorities and rank the available options. Ordering is one of the possible ways to analyse multivariate data, which provides an overview of the relationships among the elements of a system. The Multi-Criteria Decision Making (MCDM) encompasses a broad set of methods designed to set priority-based lists of alternatives based on multiple criteria, which support decision problems. Among the most widely adopted techniques, TOPSIS, dominance-based approaches, the Analytic Hierarchy Process (AHP), and Copeland scores represent some of the classical methodologies in both theoretical research and applied decision analysis.
Among the dominance-based approaches, an effective MCDM method is the Power-Weakness Ratio (PWR), which generates a tournament table (i.e., the pairwise comparison matrix) from a data matrix with a varying number of samples (i.e., alternatives to be compared) and variables (i.e., the criteria for pairwise comparisons), weighted according to their relative importance in determining the final ranking. In this study, a variant of the classical Power-Weakness Ratio is presented, significantly modifying the way the tournament table is obtained. The method, called smoothed Power-Weakness Ratio (sPWR), takes into account the dominance degree of the alternatives in each pairwise comparison exploiting the differences between the criterion values. The rationale behind the method is described by the aid of an illustrative example on a simple benchmark dataset with known reference ranking of the samples. The main advantage of the new method over PWR is that its tournament table is much more informative and sensitive to the original data values than the classical pairwise comparison matrix. A multivariate comparison with other classical MCDM methods, performed on several diverse datasets, demonstrated that the results obtained by sPWR were quite similar to those obtained by Copeland Score and TOPSIS with range scaling. However, sPWR showed a higher tendency toward generating full rankings with an enhanced ability to remove ties in the pairwise comparisons.
{"title":"Smoothed Power-Weakness Ratio (sPWR): a new informative system for multi-criteria decision making","authors":"Viviana Consonni, Davide Ballabio, Enmanuel Cruz Muñoz, Veronica Termopoli, Roberto Todeschini","doi":"10.1016/j.chemolab.2025.105624","DOIUrl":"10.1016/j.chemolab.2025.105624","url":null,"abstract":"<div><div>Nowadays, the large number of measurable variables has considerably increased the complexity of data. In the framework of the decision-making process, this leads to the need of adequate tools to set priorities and rank the available options. Ordering is one of the possible ways to analyse multivariate data, which provides an overview of the relationships among the elements of a system. The Multi-Criteria Decision Making (MCDM) encompasses a broad set of methods designed to set priority-based lists of alternatives based on multiple criteria, which support decision problems. Among the most widely adopted techniques, TOPSIS, dominance-based approaches, the Analytic Hierarchy Process (AHP), and Copeland scores represent some of the classical methodologies in both theoretical research and applied decision analysis.</div><div>Among the dominance-based approaches, an effective MCDM method is the Power-Weakness Ratio (PWR), which generates a tournament table (i.e., the pairwise comparison matrix) from a data matrix with a varying number of samples (i.e., alternatives to be compared) and variables (i.e., the criteria for pairwise comparisons), weighted according to their relative importance in determining the final ranking. In this study, a variant of the classical Power-Weakness Ratio is presented, significantly modifying the way the tournament table is obtained. The method, called smoothed Power-Weakness Ratio (sPWR), takes into account the dominance degree of the alternatives in each pairwise comparison exploiting the differences between the criterion values. The rationale behind the method is described by the aid of an illustrative example on a simple benchmark dataset with known reference ranking of the samples. The main advantage of the new method over PWR is that its tournament table is much more informative and sensitive to the original data values than the classical pairwise comparison matrix. A multivariate comparison with other classical MCDM methods, performed on several diverse datasets, demonstrated that the results obtained by sPWR were quite similar to those obtained by Copeland Score and TOPSIS with range scaling. However, sPWR showed a higher tendency toward generating full rankings with an enhanced ability to remove ties in the pairwise comparisons.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"270 ","pages":"Article 105624"},"PeriodicalIF":3.8,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146075322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-15Epub Date: 2025-11-29DOI: 10.1016/j.chemolab.2025.105603
Jan P.M. Andries , Gerjen H. Tinnevelt , Yvan Vander Heyden
The well-known Uninformative-Variable Elimination for Partial Least Squares, denoted as UVE-PLS, is not reproducible regarding the selected variables. Additionally, in UVE, variables are selected in the first minimum of the graph of the root mean squared error of cross validation (RMSECV) against the number of retained variables. This results mostly in rather large numbers of selected variables. Therefore, there is a need for a new and reproducible UVE method with better selective and preferably also better predictive abilities. Consequently, the Global-Minimum Error Reproducible Uninformative-Variable Elimination method, denoted as GME-RUVE, is proposed and tested.
In the GME-RUVE method, main characteristics of two existing methods, i.e. Jack-knife-based Partial Least Squares Regression (JK-PLSR) and Global-Minimum Error Uninformative-Variable Elimination (GME-UVE), are combined. JK-PLSR can be considered as a reproducible version of the original UVE method.
In GME-RUVE, as in the JK-PLSR method, no artificial random variables are added to the X matrix, and firstly the significance of the PLS regression coefficients is determined from jack-knifing. Secondly, as in the GME-UVE method, either the global minimum or the critical RMSECV is used for the selection of the variables. The performance of the new GME-RUVE method is investigated using four datasets with multivariate profiles, i.e. either simulated profiles, NIR spectra or theoretical molecular descriptor profiles, resulting in 12 profile-response (X-y) combinations.
The predictive performance of GME-RUVE, using the global RMSECV minimum and both the selective and predictive performances of GME-RUVE, using the critical RMSECV, are significantly better than both those of the JK-PLSR method, using the first local RMSECV minimum, and of the existing UVE method. The selective and predictive performances of the new GME-RUVE method are also much better than those of the existing GME-UVE method. Moreover, variables selected by the above GME-RUVE method have a chemical meaning.
{"title":"Improved variable reduction in Partial Least Squares modelling by global-minimum error reproducible Uninformative-Variable Elimination","authors":"Jan P.M. Andries , Gerjen H. Tinnevelt , Yvan Vander Heyden","doi":"10.1016/j.chemolab.2025.105603","DOIUrl":"10.1016/j.chemolab.2025.105603","url":null,"abstract":"<div><div>The well-known Uninformative-Variable Elimination for Partial Least Squares, denoted as UVE-PLS, is not reproducible regarding the selected variables. Additionally, in UVE, variables are selected in the first minimum of the graph of the root mean squared error of cross validation (<em>RMSECV</em>) against the number of retained variables. This results mostly in rather large numbers of selected variables. Therefore, there is a need for a new and reproducible UVE method with better selective and preferably also better predictive abilities. Consequently, the Global-Minimum Error Reproducible Uninformative-Variable Elimination method, denoted as GME-RUVE, is proposed and tested.</div><div>In the GME-RUVE method, main characteristics of two existing methods, i.e. Jack-knife-based Partial Least Squares Regression (JK-PLSR) and Global-Minimum Error Uninformative-Variable Elimination (GME-UVE), are combined. JK-PLSR can be considered as a reproducible version of the original UVE method.</div><div>In GME-RUVE, as in the JK-PLSR method, no artificial random variables are added to the <strong><em>X</em></strong> matrix, and firstly the significance of the PLS regression coefficients is determined from jack-knifing. Secondly, as in the GME-UVE method, either the <em>global minimum</em> or the <em>critical RMSECV</em> is used for the selection of the variables. The performance of the new GME-RUVE method is investigated using four datasets with multivariate profiles, i.e. either simulated profiles, NIR spectra or theoretical molecular descriptor profiles, resulting in 12 profile-response (<strong><em>X</em></strong>-<strong><em>y</em></strong>) combinations.</div><div>The predictive performance of GME-RUVE, using the <em>global RMSECV minimum</em> and both the selective and predictive performances of GME-RUVE, using the <em>critical RMSECV</em>, are significantly better than both those of the JK-PLSR method, using the <em>first local RMSECV minimum</em>, and of the existing UVE method. The selective and predictive performances of the new GME-RUVE method are also much better than those of the existing GME-UVE method. Moreover, variables selected by the above GME-RUVE method have a chemical meaning.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105603"},"PeriodicalIF":3.8,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145682819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-15Epub Date: 2025-12-03DOI: 10.1016/j.chemolab.2025.105607
Song Tingting , Sadia Noureen , Saliha Kamran , Sobhy M. Ibrahim , Adnan Aslam
Chemical graph theory serves as a foundational framework in chemical informatics, offering molecular descriptors that enable the prediction of critical physicochemical properties. This study investigates the utility of two recently proposed topological indices — the Lanzhou index and its derivative, the Ad-hoc Lanzhou index — by computing them for four structurally diverse systems: Bismuth(III) Iodide (a layered inorganic compound), Nanostar Dendrimer (a hyperbranched polymer), and the two-dimensional Triangular Oxide and Triangular Silicate Networks. To assess the indices predictive power, we established linear regression models correlating these indices with five experimentally relevant properties of 21 phenethylamine derivatives: molar refractivity (MR), octanol-water partition coefficient (LOG P), calculated Log P (CLog P), critical volume (CV), and boiling point. Statistical robustness was evaluated using the coefficient of determination (), F-statistic, and significance level (-value). The models for boiling point, CV, and MR exhibited strong significance (), while LOG P and CLog P also showed statistically valid correlations (), though with slightly lower values. Notably, the Lanzhou index demonstrated marginally superior performance in predicting partition coefficients, suggesting its sensitivity to hydrophobic interactions. These results underscore the efficacy of Lanzhou-based indices as reliable tools for quantifying structure–property relationships, particularly in drug design applications where rapid estimation of solubility, volatility, and bioavailability is critical. Our findings advocate for the broader integration of these indices into cheminformatics pipelines to augment molecular screening and optimization processes
化学图论作为化学信息学的基础框架,提供分子描述符,使关键的物理化学性质的预测成为可能。本研究研究了最近提出的两种拓扑指数的效用——兰州指数及其衍生物,Ad-hoc兰州指数——通过计算四种结构不同的体系:碘化铋(一种层状无机化合物)、纳米树状大分子(一种超支化聚合物)和二维三角形氧化物和三角形硅酸盐网络。为了评估这些指标的预测能力,我们建立了线性回归模型,将这些指标与21种苯乙胺衍生物的五种实验相关性质相关联:摩尔折射率(MR)、辛醇-水分配系数(LOG P)、计算LOG P (CLog P)、临界体积(CV)和沸点。采用决定系数(R2)、f统计量和显著性水平(p值)评估统计稳健性。沸点、CV和MR的模型显示出很强的显著性(R2>0,P=0),而LOG P和CLog P也显示出统计学上有效的相关性(P=0),尽管R2值略低。值得注意的是,兰州指数在预测分配系数方面表现出略微优越的性能,表明其对疏水相互作用的敏感性。这些结果强调了兰州指数作为定量结构-性质关系的可靠工具的有效性,特别是在药物设计应用中,快速估计溶解度、挥发性和生物利用度至关重要。我们的研究结果提倡将这些指标更广泛地整合到化学信息学管道中,以增强分子筛选和优化过程
{"title":"Chemometric modeling of physicochemical properties using Lanzhou and Ad-Hoc Lanzhou indices: A multi-scale approach for drug design and material informatics","authors":"Song Tingting , Sadia Noureen , Saliha Kamran , Sobhy M. Ibrahim , Adnan Aslam","doi":"10.1016/j.chemolab.2025.105607","DOIUrl":"10.1016/j.chemolab.2025.105607","url":null,"abstract":"<div><div>Chemical graph theory serves as a foundational framework in chemical informatics, offering molecular descriptors that enable the prediction of critical physicochemical properties. This study investigates the utility of two recently proposed topological indices — the Lanzhou index and its derivative, the Ad-hoc Lanzhou index — by computing them for four structurally diverse systems: Bismuth(III) Iodide (a layered inorganic compound), Nanostar Dendrimer (a hyperbranched polymer), and the two-dimensional Triangular Oxide and Triangular Silicate Networks. To assess the indices predictive power, we established linear regression models correlating these indices with five experimentally relevant properties of 21 phenethylamine derivatives: molar refractivity (MR), octanol-water partition coefficient (LOG P), calculated Log P (CLog P), critical volume (CV), and boiling point. Statistical robustness was evaluated using the coefficient of determination (<span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>), F-statistic, and significance level (<span><math><mi>P</mi></math></span>-value). The models for boiling point, CV, and MR exhibited strong significance (<span><math><mrow><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>></mo><mn>0</mn><mo>,</mo><mi>P</mi><mo>=</mo><mn>0</mn></mrow></math></span>), while LOG P and CLog P also showed statistically valid correlations (<span><math><mrow><mi>P</mi><mo>=</mo><mn>0</mn></mrow></math></span>), though with slightly lower <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> values. Notably, the Lanzhou index demonstrated marginally superior performance in predicting partition coefficients, suggesting its sensitivity to hydrophobic interactions. These results underscore the efficacy of Lanzhou-based indices as reliable tools for quantifying structure–property relationships, particularly in drug design applications where rapid estimation of solubility, volatility, and bioavailability is critical. Our findings advocate for the broader integration of these indices into cheminformatics pipelines to augment molecular screening and optimization processes</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105607"},"PeriodicalIF":3.8,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145682774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-15Epub Date: 2025-12-02DOI: 10.1016/j.chemolab.2025.105605
Yunxin Wang , Wenjing Zhang , Hongguo Wei , Yuetian Ren , Haosong Du , Wenbin Xu , Ailing Tan , Shuo Chen
Bacterial infections are a critical global health issue, requiring rapid and precise pathogen identification for effective infection control. Traditional methods, such as culture and nucleic acid amplification, are often slow and lack sensitivity. Raman spectroscopy combing with deep learning has been a powerful technique for microbial identification. However, limitations such as bacterial physiological states, genetic variation, interference from biological materials, and differences in laboratory conditions make its practical application still challenging. This study introduces a feature-enhanced dual-attention pathway Shifted Window-Ultra (Swin-Ultra) Transformer architecture, integrated with deep transfer learning, to address challenges like bacterial physiological states, genetic variation, and laboratory condition discrepancies. A Bacterial Pre-trained Transformer (BPT) was developed using the Bacteria-ID database, achieving excellent classification performance, i.e., 98.26 % accuracy. Fine-tuning with clinical datasets yielded accuracies of 99.80 % for bacterial pathogens and 98.53 % for Cryptococcus genotypes. This approach, bridges laboratory models and clinical applications, enhancing unknown pathogen identification, infection control, and public health surveillance, with significant potential to improve patient outcomes.
{"title":"Bridging lab-to-clinic: microbiological screening via Swin-Ultra Transformer with transfer learning","authors":"Yunxin Wang , Wenjing Zhang , Hongguo Wei , Yuetian Ren , Haosong Du , Wenbin Xu , Ailing Tan , Shuo Chen","doi":"10.1016/j.chemolab.2025.105605","DOIUrl":"10.1016/j.chemolab.2025.105605","url":null,"abstract":"<div><div>Bacterial infections are a critical global health issue, requiring rapid and precise pathogen identification for effective infection control. Traditional methods, such as culture and nucleic acid amplification, are often slow and lack sensitivity. Raman spectroscopy combing with deep learning has been a powerful technique for microbial identification. However, limitations such as bacterial physiological states, genetic variation, interference from biological materials, and differences in laboratory conditions make its practical application still challenging. This study introduces a feature-enhanced dual-attention pathway Shifted Window-Ultra (Swin-Ultra) Transformer architecture, integrated with deep transfer learning, to address challenges like bacterial physiological states, genetic variation, and laboratory condition discrepancies. A Bacterial Pre-trained Transformer (BPT) was developed using the Bacteria-ID database, achieving excellent classification performance, i.e., 98.26 % accuracy. Fine-tuning with clinical datasets yielded accuracies of 99.80 % for bacterial pathogens and 98.53 % for Cryptococcus genotypes. This approach, bridges laboratory models and clinical applications, enhancing unknown pathogen identification, infection control, and public health surveillance, with significant potential to improve patient outcomes.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105605"},"PeriodicalIF":3.8,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145682908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-15Epub Date: 2025-11-26DOI: 10.1016/j.chemolab.2025.105591
Juehong Dai , Liheng Dong , Jingjing Xu , Lingli Deng , Lei Guo , Jiyang Dong
Reliable evaluation of extracted ion chromatograms (EICs) remains a persistent challenge in LC–MS metabolomics, as inaccuracies in peak identification can profoundly impact subsequent data analysis and interpretation. While recent deep learning approaches show promise, their computational burden, limited generalizability, and lack of interpretability hinder broad adoption in routine analytical workflows. To address these limitations, we introduce EXACT-EIC (EXplainable Assessment of Chromatogram qualiTy for EICs), a lightweight, explainable machine learning framework. EXACT-EIC employs a thoughtfully designed 34 handcrafted features to perform two critical tasks: effective binary classification of EICs (peak vs. noise) and quantitative quality scoring. Benchmarking on curated in-house and public testing set demonstrated that EXACT-EIC achieved 95.2 % accuracy and 98.1 % recall for classification. For quantitative assessment, it attained a mean absolute error of 0.70 on a 1–10 expert-assigned quality scale. These results consistently outperformed state-of-the-art deep learning methods including PeakOnly and QuanFormer. Furthermore, Shapley Additive exPlanations (SHAP) analysis quantified the contribution of key chromatographic features (e.g., apex-boundary ratio, distribution entropy) to model predictions, offering transparent mechanistic insights absent in "black-box" architectures. By combining robustness, interpretability, and computational efficiency, EXACT-EIC facilitates reliable EIC evaluation across diverse platforms and experimental conditions. It provides a practical, deployable solution for automated quality control and confident metabolite annotation, addressing a critical need in untargeted LC–MS metabolomics workflows.
{"title":"Explainable machine learning enables robust evaluation of extracted ion chromatograms in LC–MS metabolomics","authors":"Juehong Dai , Liheng Dong , Jingjing Xu , Lingli Deng , Lei Guo , Jiyang Dong","doi":"10.1016/j.chemolab.2025.105591","DOIUrl":"10.1016/j.chemolab.2025.105591","url":null,"abstract":"<div><div>Reliable evaluation of extracted ion chromatograms (EICs) remains a persistent challenge in LC–MS metabolomics, as inaccuracies in peak identification can profoundly impact subsequent data analysis and interpretation. While recent deep learning approaches show promise, their computational burden, limited generalizability, and lack of interpretability hinder broad adoption in routine analytical workflows. To address these limitations, we introduce EXACT-EIC (EXplainable Assessment of Chromatogram qualiTy for EICs), a lightweight, explainable machine learning framework. EXACT-EIC employs a thoughtfully designed 34 handcrafted features to perform two critical tasks: effective binary classification of EICs (peak vs. noise) and quantitative quality scoring. Benchmarking on curated in-house and public testing set demonstrated that EXACT-EIC achieved 95.2 % accuracy and 98.1 % recall for classification. For quantitative assessment, it attained a mean absolute error of 0.70 on a 1–10 expert-assigned quality scale. These results consistently outperformed state-of-the-art deep learning methods including PeakOnly and QuanFormer. Furthermore, Shapley Additive exPlanations (SHAP) analysis quantified the contribution of key chromatographic features (e.g., apex-boundary ratio, distribution entropy) to model predictions, offering transparent mechanistic insights absent in \"black-box\" architectures. By combining robustness, interpretability, and computational efficiency, EXACT-EIC facilitates reliable EIC evaluation across diverse platforms and experimental conditions. It provides a practical, deployable solution for automated quality control and confident metabolite annotation, addressing a critical need in untargeted LC–MS metabolomics workflows.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105591"},"PeriodicalIF":3.8,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145610607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}