Pub Date : 2025-12-17DOI: 10.1016/j.chemolab.2025.105617
Mohammed Saif Ismail Hameed, Robin van der Haar, Ying Chen, Peter Goos
When experimental tests differ in cost and the experiment is constrained by a fixed total budget, the optimal number of tests and the allocation between expensive and inexpensive tests cannot be determined a priori. We propose using a Variable Neighborhood Search (VNS) algorithm to generate optimal experimental designs for such problems. VNS is an intuitive and flexible metaheuristic that has been successfully applied to a wide range of optimization problems. We illustrate the effectiveness of the VNS algorithm by generating improved designs for a micronization experiment.
{"title":"Optimal design of experiments when not every test is equally expensive","authors":"Mohammed Saif Ismail Hameed, Robin van der Haar, Ying Chen, Peter Goos","doi":"10.1016/j.chemolab.2025.105617","DOIUrl":"10.1016/j.chemolab.2025.105617","url":null,"abstract":"<div><div>When experimental tests differ in cost and the experiment is constrained by a fixed total budget, the optimal number of tests and the allocation between expensive and inexpensive tests cannot be determined a priori. We propose using a Variable Neighborhood Search (VNS) algorithm to generate optimal experimental designs for such problems. VNS is an intuitive and flexible metaheuristic that has been successfully applied to a wide range of optimization problems. We illustrate the effectiveness of the VNS algorithm by generating improved designs for a micronization experiment.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105617"},"PeriodicalIF":3.8,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145836845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-17DOI: 10.1016/j.chemolab.2025.105614
Haoran Li , Xin Zhang , Pengchegn Wu , Yang Zhang , Jiyong Shi , Xiaobo Zou
Advances in spectral techniques have generated high-resolution data with thousands of variables. Although an increasing number of variables provides more comprehensive molecular information, it also brings more challenges for existing chemometrics methods, such as the risk of over-fitting and the lack of interpretability. Therefore, we propose a hybrid variable selection approach specifically designed for large-scale datasets. First, considering the continuous characteristics of spectral variables and their importance, interval partial least squares (iPLS) and variable combination population analysis (VCPA) were applied to select relevant variables while reducing the variable space. Second, we consider that truly relevant variables exhibit consistent importance across the sample domain for the same analytical tasks and are therefore more likely to be selected and retained. Consequently, a cross-domain constrained ensemble (CCE) strategy is developed using the least absolute shrinkage and selection operator (LASSO) to further enhance the performance of variable selection. Experiments on wine H NMR and pork Raman spectroscopy datasets demonstrate that the proposed method improves prediction performance in terms of RMSEP and RPD. In addition, the proposed CCE method demonstrates superior prediction improvement performance over other final selection methods. These results confirm the effectiveness of both the hybrid variable selection framework and the CCE strategy in handling large-scale spectral datasets.
{"title":"A hybrid variable selection with cross-domain constrained ensemble (CCE) for large-scale spectroscopic data","authors":"Haoran Li , Xin Zhang , Pengchegn Wu , Yang Zhang , Jiyong Shi , Xiaobo Zou","doi":"10.1016/j.chemolab.2025.105614","DOIUrl":"10.1016/j.chemolab.2025.105614","url":null,"abstract":"<div><div>Advances in spectral techniques have generated high-resolution data with thousands of variables. Although an increasing number of variables provides more comprehensive molecular information, it also brings more challenges for existing chemometrics methods, such as the risk of over-fitting and the lack of interpretability. Therefore, we propose a hybrid variable selection approach specifically designed for large-scale datasets. First, considering the continuous characteristics of spectral variables and their importance, interval partial least squares (iPLS) and variable combination population analysis (VCPA) were applied to select relevant variables while reducing the variable space. Second, we consider that truly relevant variables exhibit consistent importance across the sample domain for the same analytical tasks and are therefore more likely to be selected and retained. Consequently, a cross-domain constrained ensemble (CCE) strategy is developed using the least absolute shrinkage and selection operator (LASSO) to further enhance the performance of variable selection. Experiments on wine <span><math><msup><mrow></mrow><mrow><mn>1</mn></mrow></msup></math></span>H NMR and pork Raman spectroscopy datasets demonstrate that the proposed method improves prediction performance in terms of RMSEP and RPD. In addition, the proposed CCE method demonstrates superior prediction improvement performance over other final selection methods. These results confirm the effectiveness of both the hybrid variable selection framework and the CCE strategy in handling large-scale spectral datasets.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105614"},"PeriodicalIF":3.8,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145787026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-15DOI: 10.1016/j.chemolab.2025.105618
Ieda S. Scarminio , Roy E. Bruns
A short history of the beginning of chemometric activities in Brazil as well as early international interactions are presented. Details of early research efforts on main frame computers, 8-bit microcomputers and the first 16-bit microcomputers are detailed. A very brief discussion of the rapid growth of chemometrics in Brazil as the result of readily available software is given.
{"title":"Chemometrics in Brazil: The early days","authors":"Ieda S. Scarminio , Roy E. Bruns","doi":"10.1016/j.chemolab.2025.105618","DOIUrl":"10.1016/j.chemolab.2025.105618","url":null,"abstract":"<div><div>A short history of the beginning of chemometric activities in Brazil as well as early international interactions are presented. Details of early research efforts on main frame computers, 8-bit microcomputers and the first 16-bit microcomputers are detailed. A very brief discussion of the rapid growth of chemometrics in Brazil as the result of readily available software is given.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105618"},"PeriodicalIF":3.8,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145787023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-12DOI: 10.1016/j.chemolab.2025.105615
Yinran Xiong , Jie Tang , Guangming Qiu , Peng Wang , Yuncan Chen , Jing Jing , Lijun Zhu
Agricultural products often exhibit substantial batch-to-batch variability in their chemical and physical properties due to environmental and other uncontrollable factors, making robust quality monitoring essential for ensuring product consistency and stability. Near-infrared (NIR) spectroscopy offers rich chemical and physical information for qualitative quality assessment, but its high dimensionality and the scarcity of abnormal samples, since non-conforming products are not intentionally manufactured, limit the applicability of conventional supervised learning approaches. To address these challenges, this study proposes Covariance-Shrunk Slow Feature Analysis (CSSFA), a novel unsupervised learning method that integrates covariance shrinkage into the Slow Feature Analysis (SFA) framework. CSSFA mitigates estimation bias in high-dimensional settings and improves the robustness and interpretability of extracted features. Experiments on two NIR tobacco datasets demonstrate that CSSFA effectively captures features related to product quality stability and achieves accurate anomaly detection without requiring large numbers of abnormal samples. This work provides a scalable and interpretable strategy for anomaly detection of agricultural products using NIR spectroscopy with abnormal samples which are rare or unavailable.
{"title":"An unsupervised approach to anomaly detection in near-infrared spectroscopy via Covariance-Shrunk Slow Feature Analysis","authors":"Yinran Xiong , Jie Tang , Guangming Qiu , Peng Wang , Yuncan Chen , Jing Jing , Lijun Zhu","doi":"10.1016/j.chemolab.2025.105615","DOIUrl":"10.1016/j.chemolab.2025.105615","url":null,"abstract":"<div><div>Agricultural products often exhibit substantial batch-to-batch variability in their chemical and physical properties due to environmental and other uncontrollable factors, making robust quality monitoring essential for ensuring product consistency and stability. Near-infrared (NIR) spectroscopy offers rich chemical and physical information for qualitative quality assessment, but its high dimensionality and the scarcity of abnormal samples, since non-conforming products are not intentionally manufactured, limit the applicability of conventional supervised learning approaches. To address these challenges, this study proposes Covariance-Shrunk Slow Feature Analysis (CSSFA), a novel unsupervised learning method that integrates covariance shrinkage into the Slow Feature Analysis (SFA) framework. CSSFA mitigates estimation bias in high-dimensional settings and improves the robustness and interpretability of extracted features. Experiments on two NIR tobacco datasets demonstrate that CSSFA effectively captures features related to product quality stability and achieves accurate anomaly detection without requiring large numbers of abnormal samples. This work provides a scalable and interpretable strategy for anomaly detection of agricultural products using NIR spectroscopy with abnormal samples which are rare or unavailable.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105615"},"PeriodicalIF":3.8,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145787024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-12DOI: 10.1016/j.chemolab.2025.105616
Luiz Renato Rosa Leme de Souza, Carlos Alberto Rios, Márcia Cristina Breitkreitz
Context:
Python is a widely-known open-source and robust programming language used in many research fields. Artificial Intelligence (AI) is a growing tool with many applications that is capable of helping with long and difficult tasks. Routines for preprocessing spectra signals and applying chemometric models are usually part of expensive software. Despite the existence of isolated code snippets, libraries, and tutorials, it is a hard task to find an open-access routine that guides from the raw Raman mapping data set to the clear chemical information contained within the analyzed samples by means of chemical maps.
Objectives:
This paper presents an AI-assisted Python-based routine for preprocessing Raman mapping results and generating chemical maps of samples using the chemometric methods: CLS, PLS and PCA, with the goal of providing an open access routine for research purposes.
Methods:
Python programming language and AI tools were used as code generators, translators, and debugging tools to assist the creation of the routine, and the results were compared to the ones obtained by a Matlab routine.
Results:
The Python routine successfully performed the preprocessing of the Raman spectra and the calculations of the chemometric methods CLS, PLS and PCA generating chemical maps. The results were equivalent to those of Matlab for the same data set, leading to the same conclusions.
Conclusion:
This paper demonstrated the application of an open access Python-based AI-guided routine to preprocess and generate chemical maps applying CLS, PCA and PLS models, now available and editable to suit different needs.
{"title":"Raman mapping and Chemometrics: An open access Python-based routine to preprocess and generate chemical maps applying CLS, PCA and PLS methods","authors":"Luiz Renato Rosa Leme de Souza, Carlos Alberto Rios, Márcia Cristina Breitkreitz","doi":"10.1016/j.chemolab.2025.105616","DOIUrl":"10.1016/j.chemolab.2025.105616","url":null,"abstract":"<div><h3>Context:</h3><div>Python is a widely-known open-source and robust programming language used in many research fields. Artificial Intelligence (AI) is a growing tool with many applications that is capable of helping with long and difficult tasks. Routines for preprocessing spectra signals and applying chemometric models are usually part of expensive software. Despite the existence of isolated code snippets, libraries, and tutorials, it is a hard task to find an open-access routine that guides from the raw Raman mapping data set to the clear chemical information contained within the analyzed samples by means of chemical maps.</div></div><div><h3>Objectives:</h3><div>This paper presents an AI-assisted Python-based routine for preprocessing Raman mapping results and generating chemical maps of samples using the chemometric methods: CLS, PLS and PCA, with the goal of providing an open access routine for research purposes.</div></div><div><h3>Methods:</h3><div>Python programming language and AI tools were used as code generators, translators, and debugging tools to assist the creation of the routine, and the results were compared to the ones obtained by a Matlab routine.</div></div><div><h3>Results:</h3><div>The Python routine successfully performed the preprocessing of the Raman spectra and the calculations of the chemometric methods CLS, PLS and PCA generating chemical maps. The results were equivalent to those of Matlab for the same data set, leading to the same conclusions.</div></div><div><h3>Conclusion:</h3><div>This paper demonstrated the application of an open access Python-based AI-guided routine to preprocess and generate chemical maps applying CLS, PCA and PLS models, now available and editable to suit different needs.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105616"},"PeriodicalIF":3.8,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145787025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-08DOI: 10.1016/j.chemolab.2025.105606
Sergej Papoci, Manuel Jiménez, Michele Ghidotti, María Beatriz de la Calle Guntiñas
The willingness of consumers to pay higher prices for high quality specialties, such as Darjeeling tea, goes hand in hand with an increase of fraudulent practices in which Darjeeling tea is substituted totally or partially by cheaper teas. Currently, to evaluate the percentage of substitution that a method can detect, Darjeeling tea is mixed in different proportions with non-Darjeeling teas, and after homogenisation the mixture is analysed. This time-consuming approach implies the use of valuable amounts of sample and, therefore an alternative approach is needed. Here a method is described to calculate the minimum detectable substitution percentage of Darjeeling tea by other teas without needing to prepare real mixtures. The approach is based on the use of virtual mixtures made with the results obtained for commercially available Darjeeling and non-Darjeeling teas. The method used for authentication purposes, made use of the elemental profiles of tea obtained by Energy Dispersive X-ray Fluorescence, combined with chemometrics and modelling by Partial Least Square-Discriminant Analysis. The false positives percentage at different substitution levels, was evaluated and compared with the results obtained with real mixtures of Darjeeling and non-Darjeeling teas. Comparable results were obtained with both approaches. Twenty percent was the lowest substitution level that could be detected with an acceptable sensitivity (94 %) and specificity (86 %). A fast, easy to implement approach has been developed and validated, to calculate the minimum substitution percentage that can be detected by an authentication analytical method, without the need to carry out additional laboratory experiments.
{"title":"Evaluation of a mathematical approach to detect fraudulent substitution of Darjeeling tea with other types of tea using the elemental profiles obtained by Energy Dispersive X-ray Fluorescence","authors":"Sergej Papoci, Manuel Jiménez, Michele Ghidotti, María Beatriz de la Calle Guntiñas","doi":"10.1016/j.chemolab.2025.105606","DOIUrl":"10.1016/j.chemolab.2025.105606","url":null,"abstract":"<div><div>The willingness of consumers to pay higher prices for high quality specialties, such as Darjeeling tea, goes hand in hand with an increase of fraudulent practices in which Darjeeling tea is substituted totally or partially by cheaper teas. Currently, to evaluate the percentage of substitution that a method can detect, Darjeeling tea is mixed in different proportions with non-Darjeeling teas, and after homogenisation the mixture is analysed. This time-consuming approach implies the use of valuable amounts of sample and, therefore an alternative approach is needed. Here a method is described to calculate the minimum detectable substitution percentage of Darjeeling tea by other teas without needing to prepare real mixtures. The approach is based on the use of virtual mixtures made with the results obtained for commercially available Darjeeling and non-Darjeeling teas. The method used for authentication purposes, made use of the elemental profiles of tea obtained by Energy Dispersive X-ray Fluorescence, combined with chemometrics and modelling by Partial Least Square-Discriminant Analysis. The false positives percentage at different substitution levels, was evaluated and compared with the results obtained with real mixtures of Darjeeling and non-Darjeeling teas. Comparable results were obtained with both approaches. Twenty percent was the lowest substitution level that could be detected with an acceptable sensitivity (94 %) and specificity (86 %). A fast, easy to implement approach has been developed and validated, to calculate the minimum substitution percentage that can be detected by an authentication analytical method, without the need to carry out additional laboratory experiments.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105606"},"PeriodicalIF":3.8,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145733576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-06DOI: 10.1016/j.chemolab.2025.105613
Honghong Wang, Yan Zhang, Anqi Jia, Ting Wu, Yiping Du
Variable selection is a very effective method to improve performance of a multivariate calibration model when using high-dimensional spectral dataset. The newly proposed screening strategy of equivalent variables (EVs) and complementary variables (CVs) is worthy of attention. In the proposed method a local search mechanism was used to select the EVs, and the selection range was limited to the adjacent area of the basic variables (BVs) selected by a variable selection method, while the variables far from the BVs were not effectively screened. Aiming at overcoming the limitation of this strategy, this study proposed a global search mechanism based on full-spectrum scanning to screen EVs and investigate CVs based on EVs. The CVs selected from the EVs screened by the global search can provide richer and more accurate feature information to improve the performance of the model. Three variable selection algorithms, stability competitive adaptive reweighted sampling (SCARS), competitive adaptive reweighted sampling (CARS) and Monte Carlo and uninformative variable elimination (MC-UVE), were used to screen EVs and CVs. This strategy is applied to three datasets (corn and tablet NIR dataset, UV–visible dataset). In corn dataset, compared with the model established by the combination of CVs and BVs that used the local search mechanism to screen SCARS from the EVs of CARS and MC-UVE, the performance of the model constructed by 30 CVs combined with BVs based on the global search mechanism was significantly improved, RMSEC and RMSEP decreased from 0.0365 and 0.0590 to 0.0305 and 0.0496, respectively. Similarly, the RMSEP of the model prediction results constructed by the CVs of CARS and MC-UVE combined with BVs obtained by the global search decreased from 0.0625 and 0.0505 to 0.0555 and 0.0403, respectively. Similar results were obtained for other datasets.
{"title":"Equivalent and complementary variables screening based on global search mechanism for wavelength optimization in spectral multivariate calibration","authors":"Honghong Wang, Yan Zhang, Anqi Jia, Ting Wu, Yiping Du","doi":"10.1016/j.chemolab.2025.105613","DOIUrl":"10.1016/j.chemolab.2025.105613","url":null,"abstract":"<div><div>Variable selection is a very effective method to improve performance of a multivariate calibration model when using high-dimensional spectral dataset. The newly proposed screening strategy of equivalent variables (EVs) and complementary variables (CVs) is worthy of attention. In the proposed method a local search mechanism was used to select the EVs, and the selection range was limited to the adjacent area of the basic variables (BVs) selected by a variable selection method, while the variables far from the BVs were not effectively screened. Aiming at overcoming the limitation of this strategy, this study proposed a global search mechanism based on full-spectrum scanning to screen EVs and investigate CVs based on EVs. The CVs selected from the EVs screened by the global search can provide richer and more accurate feature information to improve the performance of the model. Three variable selection algorithms, stability competitive adaptive reweighted sampling (SCARS), competitive adaptive reweighted sampling (CARS) and Monte Carlo and uninformative variable elimination (MC-UVE), were used to screen EVs and CVs. This strategy is applied to three datasets (corn and tablet NIR dataset, UV–visible dataset). In corn dataset, compared with the model established by the combination of CVs and BVs that used the local search mechanism to screen SCARS from the EVs of CARS and MC-UVE, the performance of the model constructed by 30 CVs combined with BVs based on the global search mechanism was significantly improved, RMSEC and RMSEP decreased from 0.0365 and 0.0590 to 0.0305 and 0.0496, respectively. Similarly, the RMSEP of the model prediction results constructed by the CVs of CARS and MC-UVE combined with BVs obtained by the global search decreased from 0.0625 and 0.0505 to 0.0555 and 0.0403, respectively. Similar results were obtained for other datasets.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105613"},"PeriodicalIF":3.8,"publicationDate":"2025-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145733700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-04DOI: 10.1016/j.chemolab.2025.105602
Marc Offroy , Amir Ayadi , Léon Govohetchan , Janette Ayoub , Thomas M. Hancewicz , Ludovic Duponchel , Mario Marchetti
Molecular spectroscopy is a powerful, non-destructive technique for chemical analysis, as the sample remains unaltered during the measurement. Although it is essential for getting meaningful information, it often suffers from spectral overlap, making it challenging to identify individual components within a sample. Therefore, for over fifty years, a plethora of mathematical approaches have been developed to unmix complex signals and push the detection limits of spectroscopic instruments, such as Blind Source Separation (BSS) or Multivariate Curve Resolution (MCR), to name but a few. However, despite these numerous advances, and even as the amount of data increases – potentially providing more information – they continue to face inherent limitations (i.e., selectivity problems), particularly when dealing with contemporary samples, making their thorough characterization an increasingly intricate challenge, especially with diminishing prior knowledge. This article presents a novel signal unmixing method applied to hyperspectral Raman imaging designed to overcome these limitations. Our approach, based on a Non-Negative Matrix Factorization (NMF), addresses critical challenges such as rotational ambiguity and noise sensitivity, which often prevent accurate pure component spectral unmixing. First, we introduce our methodology and explain how it differs from existing mathematical methods. We then evaluate its performance on a well-known real-world dataset in the chemometrics community called “emulsion” from hyperspectral Raman imaging. To further challenge our method, we apply it to a complex simulated molecular signal dataset. Finally, we compare our results with those obtained using the standard MCR-ALS approach. Our initial results demonstrate that this RS-NMF approach improves the unmixing of complex signals.
{"title":"Enhanced Raman hyperspectral imaging using RS-NMF: a novel Regularized Sparse Non-negative Matrix Factorization for spectral unmixing","authors":"Marc Offroy , Amir Ayadi , Léon Govohetchan , Janette Ayoub , Thomas M. Hancewicz , Ludovic Duponchel , Mario Marchetti","doi":"10.1016/j.chemolab.2025.105602","DOIUrl":"10.1016/j.chemolab.2025.105602","url":null,"abstract":"<div><div>Molecular spectroscopy is a powerful, non-destructive technique for chemical analysis, as the sample remains unaltered during the measurement. Although it is essential for getting meaningful information, it often suffers from spectral overlap, making it challenging to identify individual components within a sample. Therefore, for over fifty years, a plethora of mathematical approaches have been developed to unmix complex signals and push the detection limits of spectroscopic instruments, such as Blind Source Separation (BSS) or Multivariate Curve Resolution (MCR), to name but a few. However, despite these numerous advances, and even as the amount of data increases – potentially providing more information – they continue to face inherent limitations (i.e., selectivity problems), particularly when dealing with contemporary samples, making their thorough characterization an increasingly intricate challenge, especially with diminishing prior knowledge. This article presents a novel signal unmixing method applied to hyperspectral Raman imaging designed to overcome these limitations. Our approach, based on a Non-Negative Matrix Factorization (NMF), addresses critical challenges such as rotational ambiguity and noise sensitivity, which often prevent accurate pure component spectral unmixing. First, we introduce our methodology and explain how it differs from existing mathematical methods. We then evaluate its performance on a well-known real-world dataset in the chemometrics community called “emulsion” from hyperspectral Raman imaging. To further challenge our method, we apply it to a complex simulated molecular signal dataset. Finally, we compare our results with those obtained using the standard MCR-ALS approach. Our initial results demonstrate that this RS-NMF approach improves the unmixing of complex signals.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105602"},"PeriodicalIF":3.8,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145733702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-03DOI: 10.1016/j.chemolab.2025.105607
Song Tingting , Sadia Noureen , Saliha Kamran , Sobhy M. Ibrahim , Adnan Aslam
Chemical graph theory serves as a foundational framework in chemical informatics, offering molecular descriptors that enable the prediction of critical physicochemical properties. This study investigates the utility of two recently proposed topological indices — the Lanzhou index and its derivative, the Ad-hoc Lanzhou index — by computing them for four structurally diverse systems: Bismuth(III) Iodide (a layered inorganic compound), Nanostar Dendrimer (a hyperbranched polymer), and the two-dimensional Triangular Oxide and Triangular Silicate Networks. To assess the indices predictive power, we established linear regression models correlating these indices with five experimentally relevant properties of 21 phenethylamine derivatives: molar refractivity (MR), octanol-water partition coefficient (LOG P), calculated Log P (CLog P), critical volume (CV), and boiling point. Statistical robustness was evaluated using the coefficient of determination (), F-statistic, and significance level (-value). The models for boiling point, CV, and MR exhibited strong significance (), while LOG P and CLog P also showed statistically valid correlations (), though with slightly lower values. Notably, the Lanzhou index demonstrated marginally superior performance in predicting partition coefficients, suggesting its sensitivity to hydrophobic interactions. These results underscore the efficacy of Lanzhou-based indices as reliable tools for quantifying structure–property relationships, particularly in drug design applications where rapid estimation of solubility, volatility, and bioavailability is critical. Our findings advocate for the broader integration of these indices into cheminformatics pipelines to augment molecular screening and optimization processes
化学图论作为化学信息学的基础框架,提供分子描述符,使关键的物理化学性质的预测成为可能。本研究研究了最近提出的两种拓扑指数的效用——兰州指数及其衍生物,Ad-hoc兰州指数——通过计算四种结构不同的体系:碘化铋(一种层状无机化合物)、纳米树状大分子(一种超支化聚合物)和二维三角形氧化物和三角形硅酸盐网络。为了评估这些指标的预测能力,我们建立了线性回归模型,将这些指标与21种苯乙胺衍生物的五种实验相关性质相关联:摩尔折射率(MR)、辛醇-水分配系数(LOG P)、计算LOG P (CLog P)、临界体积(CV)和沸点。采用决定系数(R2)、f统计量和显著性水平(p值)评估统计稳健性。沸点、CV和MR的模型显示出很强的显著性(R2>0,P=0),而LOG P和CLog P也显示出统计学上有效的相关性(P=0),尽管R2值略低。值得注意的是,兰州指数在预测分配系数方面表现出略微优越的性能,表明其对疏水相互作用的敏感性。这些结果强调了兰州指数作为定量结构-性质关系的可靠工具的有效性,特别是在药物设计应用中,快速估计溶解度、挥发性和生物利用度至关重要。我们的研究结果提倡将这些指标更广泛地整合到化学信息学管道中,以增强分子筛选和优化过程
{"title":"Chemometric modeling of physicochemical properties using Lanzhou and Ad-Hoc Lanzhou indices: A multi-scale approach for drug design and material informatics","authors":"Song Tingting , Sadia Noureen , Saliha Kamran , Sobhy M. Ibrahim , Adnan Aslam","doi":"10.1016/j.chemolab.2025.105607","DOIUrl":"10.1016/j.chemolab.2025.105607","url":null,"abstract":"<div><div>Chemical graph theory serves as a foundational framework in chemical informatics, offering molecular descriptors that enable the prediction of critical physicochemical properties. This study investigates the utility of two recently proposed topological indices — the Lanzhou index and its derivative, the Ad-hoc Lanzhou index — by computing them for four structurally diverse systems: Bismuth(III) Iodide (a layered inorganic compound), Nanostar Dendrimer (a hyperbranched polymer), and the two-dimensional Triangular Oxide and Triangular Silicate Networks. To assess the indices predictive power, we established linear regression models correlating these indices with five experimentally relevant properties of 21 phenethylamine derivatives: molar refractivity (MR), octanol-water partition coefficient (LOG P), calculated Log P (CLog P), critical volume (CV), and boiling point. Statistical robustness was evaluated using the coefficient of determination (<span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>), F-statistic, and significance level (<span><math><mi>P</mi></math></span>-value). The models for boiling point, CV, and MR exhibited strong significance (<span><math><mrow><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>></mo><mn>0</mn><mo>,</mo><mi>P</mi><mo>=</mo><mn>0</mn></mrow></math></span>), while LOG P and CLog P also showed statistically valid correlations (<span><math><mrow><mi>P</mi><mo>=</mo><mn>0</mn></mrow></math></span>), though with slightly lower <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> values. Notably, the Lanzhou index demonstrated marginally superior performance in predicting partition coefficients, suggesting its sensitivity to hydrophobic interactions. These results underscore the efficacy of Lanzhou-based indices as reliable tools for quantifying structure–property relationships, particularly in drug design applications where rapid estimation of solubility, volatility, and bioavailability is critical. Our findings advocate for the broader integration of these indices into cheminformatics pipelines to augment molecular screening and optimization processes</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105607"},"PeriodicalIF":3.8,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145682774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-03DOI: 10.1016/j.chemolab.2025.105609
Omneya Attallah , Ishak Pacal
The accurate classification of brain tumors from MRI scans is important for the timely diagnosis and treatment planning process; however, previous state-of-the-art automatic image classification methods frequently struggle to balance performance with computational cost for clinical applications. In this study, we evaluated twenty lightweight Convolutional Neural Networks (CNN) models and eighteen Vision Transformers (ViT) models for multi-class brain tumor classification using a merged dataset of 17,933 MRI images from 4 categories (glioma, meningioma, pituitary tumors, and healthy brains). The study demonstrated that both groups of architectures can achieve state-of-the-art performance with EfficientNet-b0 (98.36 % accuracy, 4.01 M params) and Tiny-ViT-5M (98.41 % accuracy, 5.07 M params), ranking as the top-performing models for each category. The systematic comparison determined that the proposed lighter models have equivalent or greater performance than established lightweight frameworks, while offering computational advantages, such as MobileViT-xxSmall, which achieved outstanding performance (98.16 % accuracy) with fewer than 1 M parameters. Through benchmarking against fourteen other prior existing frameworks for brain tumor classification, we demonstrated that the top-performing lightweight models of this study maintain stable performances across all evaluation metrics (including precision, recall, and F1 score) and aim to mitigate key weaknesses of prior work, including dataset diversity and model complexity. The findings show very competitive performance across brain tumor classification, highlighting the promise of lightweight architectures to generate accurate and efficient diagnostic support for potential clinical deployment, particularly in low-resource healthcare environments where such efficiencies are vital. Moreover, this work provides useful knowledge that may assist in developing deployable artificial intelligence solutions in neuro-oncology settings.
{"title":"Comparative evaluation of lightweight convolutional neural network and vision transformer models for multi-class brain tumor classification using merged large MRI datasets","authors":"Omneya Attallah , Ishak Pacal","doi":"10.1016/j.chemolab.2025.105609","DOIUrl":"10.1016/j.chemolab.2025.105609","url":null,"abstract":"<div><div>The accurate classification of brain tumors from MRI scans is important for the timely diagnosis and treatment planning process; however, previous state-of-the-art automatic image classification methods frequently struggle to balance performance with computational cost for clinical applications. In this study, we evaluated twenty lightweight Convolutional Neural Networks (CNN) models and eighteen Vision Transformers (ViT) models for multi-class brain tumor classification using a merged dataset of 17,933 MRI images from 4 categories (glioma, meningioma, pituitary tumors, and healthy brains). The study demonstrated that both groups of architectures can achieve state-of-the-art performance with EfficientNet-b0 (98.36 % accuracy, 4.01 M params) and Tiny-ViT-5M (98.41 % accuracy, 5.07 M params), ranking as the top-performing models for each category. The systematic comparison determined that the proposed lighter models have equivalent or greater performance than established lightweight frameworks, while offering computational advantages, such as MobileViT-xxSmall, which achieved outstanding performance (98.16 % accuracy) with fewer than 1 M parameters. Through benchmarking against fourteen other prior existing frameworks for brain tumor classification, we demonstrated that the top-performing lightweight models of this study maintain stable performances across all evaluation metrics (including precision, recall, and F1 score) and aim to mitigate key weaknesses of prior work, including dataset diversity and model complexity. The findings show very competitive performance across brain tumor classification, highlighting the promise of lightweight architectures to generate accurate and efficient diagnostic support for potential clinical deployment, particularly in low-resource healthcare environments where such efficiencies are vital. Moreover, this work provides useful knowledge that may assist in developing deployable artificial intelligence solutions in neuro-oncology settings.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105609"},"PeriodicalIF":3.8,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145733701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}