首页 > 最新文献

Chemometrics and Intelligent Laboratory Systems最新文献

英文 中文
Fused LassoNet: Sequential feature selection for spectral data with neural networks
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-01-11 DOI: 10.1016/j.chemolab.2024.105315
Chaeyun Yeo , Namjoon Suh , Younghoon Kim
Feature selection for high-dimensional spectral data is critical to improve the accuracy and interpretability of chemometric models. Various methods for feature selection have been introduced in chemometrics; however, achieving explainable sequential feature selection while conducting nonlinear classification simultaneously remains challenging. To address the challenge, this study proposes a fused least absolute shrinkage and selection operator network (LassoNet) that integrates the regularization principles of both the LassoNet and fused Lasso within the framework of a neural network. Further, the fused Lasso method facilitates continuous feature selection by considering the sequence between features, whereas LassoNet method enables nonlinear modeling using neural networks. We solve the fused LassoNet problem with proximal gradient descent, and the optimality of the proximal operator is mathematically proved. This study analyzes the performances of Lasso, fused Lasso, LassoNet, and fused LassoNet in classifying two groups using nine spectral datasets. The fused LassoNet demonstrates superior performance in terms of classification accuracy and sequential feature selection. These results demonstrate the proposed method enhances the predictive accuracy and interpretability of chemometric models using spectral data.
{"title":"Fused LassoNet: Sequential feature selection for spectral data with neural networks","authors":"Chaeyun Yeo ,&nbsp;Namjoon Suh ,&nbsp;Younghoon Kim","doi":"10.1016/j.chemolab.2024.105315","DOIUrl":"10.1016/j.chemolab.2024.105315","url":null,"abstract":"<div><div>Feature selection for high-dimensional spectral data is critical to improve the accuracy and interpretability of chemometric models. Various methods for feature selection have been introduced in chemometrics; however, achieving explainable sequential feature selection while conducting nonlinear classification simultaneously remains challenging. To address the challenge, this study proposes a fused least absolute shrinkage and selection operator network (LassoNet) that integrates the regularization principles of both the LassoNet and fused Lasso within the framework of a neural network. Further, the fused Lasso method facilitates continuous feature selection by considering the sequence between features, whereas LassoNet method enables nonlinear modeling using neural networks. We solve the fused LassoNet problem with proximal gradient descent, and the optimality of the proximal operator is mathematically proved. This study analyzes the performances of Lasso, fused Lasso, LassoNet, and fused LassoNet in classifying two groups using nine spectral datasets. The fused LassoNet demonstrates superior performance in terms of classification accuracy and sequential feature selection. These results demonstrate the proposed method enhances the predictive accuracy and interpretability of chemometric models using spectral data.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"257 ","pages":"Article 105315"},"PeriodicalIF":3.7,"publicationDate":"2025-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143155128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advanced feature analysis for enhancing cocrystal prediction
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-01-10 DOI: 10.1016/j.chemolab.2025.105318
Alessandro Cossard , Chiara Sabena , Gianluca Bianchini , Emanuele Priola , Roberto Gobetto , Andrea Aramini , Michele R. Chierotti
The design of novel pharmaceutical crystal forms, including molecular salts and cocrystals, has gained significant attention from pharmaceutical companies due to their ability to modulate key physicochemical and biopharmaceutical properties. The selection of appropriate coformers for cocrystallization, however, remains a challenge, typically relying on labor-intensive trial-and-error methods. This study introduces FeatureMaster, a tool designed to evaluate the representativeness of training sets relative to test sets, thereby enhancing the reliability of machine learning models in predicting cocrystallization outcomes. We employed four key algorithms — feature overlap, quartiles, Cohen's D, and p-value analysis — to a priori assess the predictive accuracy. The efficacy of these methods was evaluated on two systems: piracetam (PRC) and pyridoxine (PN). The test set data were collected from in-house experiments: the PRC and PN test sets were experimentally created with a series of coformers (20 for PRC and 14 for PN) using different synthetic techniques. The experimental tests lead to the formation of 3 new cocrystals for PRC (with quercetin, 2-ketoglutaric acid, and malic acid) and 7 new molecular salts for PN (with 2-ketoglutaric acid, pimelic acid, cinnamic acid, gallic acid, N-acetylcysteine, and caffeic acid). Training sets were collected from literature and features calculated using Hansen Solubility Parameters (HSP), Hydrogen Bond Energy (HBE), Molecular Complementarity (MC), and Quantitative Structure-Activity Relationship (QSAR) methods. Models were developed using the Random Forest algorithm, known for its robustness in handling complex datasets. Our results demonstrate that statistical analyses using overlap, Cohen's D and p-values are fundamental for improving the prediction and for providing a priori insights into the model's reliability. This approach reduces the experimental tests and resource consumption in the cocrystal screening process, offering a promising strategy for future pharmaceutical development.
{"title":"Advanced feature analysis for enhancing cocrystal prediction","authors":"Alessandro Cossard ,&nbsp;Chiara Sabena ,&nbsp;Gianluca Bianchini ,&nbsp;Emanuele Priola ,&nbsp;Roberto Gobetto ,&nbsp;Andrea Aramini ,&nbsp;Michele R. Chierotti","doi":"10.1016/j.chemolab.2025.105318","DOIUrl":"10.1016/j.chemolab.2025.105318","url":null,"abstract":"<div><div>The design of novel pharmaceutical crystal forms, including molecular salts and cocrystals, has gained significant attention from pharmaceutical companies due to their ability to modulate key physicochemical and biopharmaceutical properties. The selection of appropriate coformers for cocrystallization, however, remains a challenge, typically relying on labor-intensive trial-and-error methods. This study introduces <em>FeatureMaster</em>, a tool designed to evaluate the representativeness of training sets relative to test sets, thereby enhancing the reliability of machine learning models in predicting cocrystallization outcomes. We employed four key algorithms — feature overlap, quartiles, Cohen's D, and p-value analysis — to <em>a priori</em> assess the predictive accuracy. The efficacy of these methods was evaluated on two systems: piracetam (PRC) and pyridoxine (PN). The test set data were collected from in-house experiments: the PRC and PN test sets were experimentally created with a series of coformers (20 for PRC and 14 for PN) using different synthetic techniques. The experimental tests lead to the formation of 3 new cocrystals for PRC (with quercetin, 2-ketoglutaric acid, and malic acid) and 7 new molecular salts for PN (with 2-ketoglutaric acid, pimelic acid, cinnamic acid, gallic acid, N-acetylcysteine, and caffeic acid). Training sets were collected from literature and features calculated using Hansen Solubility Parameters (HSP), Hydrogen Bond Energy (HBE), Molecular Complementarity (MC), and Quantitative Structure-Activity Relationship (QSAR) methods. Models were developed using the Random Forest algorithm, known for its robustness in handling complex datasets. Our results demonstrate that statistical analyses using overlap, Cohen's D and p-values are fundamental for improving the prediction and for providing <em>a priori</em> insights into the model's reliability. This approach reduces the experimental tests and resource consumption in the cocrystal screening process, offering a promising strategy for future pharmaceutical development.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"257 ","pages":"Article 105318"},"PeriodicalIF":3.7,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143155130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GCNFG-DTA:Screening natural medicinal components of Cyperus esculentus targeting kinases with AIDD methods
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-01-04 DOI: 10.1016/j.chemolab.2025.105317
Haiqing Sun , Xuecong Tian , Zhuman Wen , Sizhe Zhang , Yaxuan Yang , Yixian Tu , Xiaoyi Lv
Screening bioactive molecules from natural plant compounds is currently a common approach in the field of drug discovery. Cyperus esculentus, a multipurpose crop primarily used for food, is highly valued in certain countries or regions for its unique medicinal properties. Although there is a foundational understanding of its components and pharmacological effects, exploration of its effective targets, especially kinase targets, remains insufficient. Our study integrates Artificial Intelligence-Assisted Drug Design (AIDD) by utilizing the KIBA and BindingDB datasets to train the GCNFG-DTA deep learning model for predicting the kinase target affinity of 152 active compounds from Cyperus esculentus. By screening for high-affinity molecule-kinase target pairs and employing molecular docking and molecular dynamics simulations, the study successfully identified pairs of the most promising active molecule-target combinations. Our predicting results demonstrate that the GCN-GAT-FG model, with its excellent predictive ability (Achieving a low MSE of 0.131 and a high CI of 0.896), significantly accelerates the discovery process of bioactive molecules. Further molecular docking validated that 15 high-affinity molecule-kinase target pairs had docking energy scores below −5 kJ/mol. Among these, 14 pairs exhibited stable conformations during 100 ns molecular dynamics simulations. Notably, Cyanidin chloride, N-Feruloyltyramine, and Imbricatonol were identified as the most promising molecules, demonstrating the high conformational stability when targeting the MAP3K8, CLK4 and FGR kinase targets, respectively. These findings provide a scientific basis for further exploring the medicinal potential of Cyperus esculentus. Overall, the deep learning method used in our study offers new insights into the field of drug discovery related to natural compounds by rapidly and effectively predicting the specific medicinal value components of Cyperus esculentus.
{"title":"GCNFG-DTA:Screening natural medicinal components of Cyperus esculentus targeting kinases with AIDD methods","authors":"Haiqing Sun ,&nbsp;Xuecong Tian ,&nbsp;Zhuman Wen ,&nbsp;Sizhe Zhang ,&nbsp;Yaxuan Yang ,&nbsp;Yixian Tu ,&nbsp;Xiaoyi Lv","doi":"10.1016/j.chemolab.2025.105317","DOIUrl":"10.1016/j.chemolab.2025.105317","url":null,"abstract":"<div><div>Screening bioactive molecules from natural plant compounds is currently a common approach in the field of drug discovery. <em>Cyperus esculentus</em>, a multipurpose crop primarily used for food, is highly valued in certain countries or regions for its unique medicinal properties. Although there is a foundational understanding of its components and pharmacological effects, exploration of its effective targets, especially kinase targets, remains insufficient. Our study integrates Artificial Intelligence-Assisted Drug Design (AIDD) by utilizing the KIBA and BindingDB datasets to train the GCNFG-DTA deep learning model for predicting the kinase target affinity of 152 active compounds from <em>Cyperus esculentus</em>. By screening for high-affinity molecule-kinase target pairs and employing molecular docking and molecular dynamics simulations, the study successfully identified pairs of the most promising active molecule-target combinations. Our predicting results demonstrate that the GCN-GAT-FG model, with its excellent predictive ability (Achieving a low MSE of 0.131 and a high CI of 0.896), significantly accelerates the discovery process of bioactive molecules. Further molecular docking validated that 15 high-affinity molecule-kinase target pairs had docking energy scores below −5 kJ/mol. Among these, 14 pairs exhibited stable conformations during 100 ns molecular dynamics simulations. Notably, Cyanidin chloride, N-Feruloyltyramine, and Imbricatonol were identified as the most promising molecules, demonstrating the high conformational stability when targeting the MAP3K8, CLK4 and FGR kinase targets, respectively. These findings provide a scientific basis for further exploring the medicinal potential of <em>Cyperus esculentus</em>. Overall, the deep learning method used in our study offers new insights into the field of drug discovery related to natural compounds by rapidly and effectively predicting the specific medicinal value components of <em>Cyperus esculentus</em>.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"257 ","pages":"Article 105317"},"PeriodicalIF":3.7,"publicationDate":"2025-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143156330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Smartphone based app development with machine learning using Hibiscus sabdariffa L. extract for pH estimation
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-01-02 DOI: 10.1016/j.chemolab.2024.105310
Ömer Faruk Aydın , Merve Aydın , Melisa Caliskan Demir , Sibel Kahraman
This study presents a novel approach for pH estimation in buffer solutions using images of solutions prepared with Hibiscus sabdariffa L. as a natural pH indicator. The images of the solutions, each displaying distinctive colours indicative of their pH levels, were transformed into standardized 200x200-pixel images through the application of image processing techniques. Following this, a pH prediction model was constructed using the Adaptive Boosting regressor algorithm. The pH values of the training data used when training the model were distributed irregularly between 0–14. The models were trained with 94 pictures and 1880 experimental values. In addition, a reliable pre-processing part has been placed into the model using image processing techniques, allowing test data to be obtained in any desired environment. The obtained training and test data were separated from noise parameters, affecting the prediction results negatively. A smartphone application based on the model has been developed and made available to everyone. This innovative methodology bridges the gap between traditional pH measurement techniques and computer vision, offering a more accessible and eco-friendly means of pH assessment. The practical applications of this research extend to various fields, including environmental monitoring, agriculture, and educational settings.
{"title":"Smartphone based app development with machine learning using Hibiscus sabdariffa L. extract for pH estimation","authors":"Ömer Faruk Aydın ,&nbsp;Merve Aydın ,&nbsp;Melisa Caliskan Demir ,&nbsp;Sibel Kahraman","doi":"10.1016/j.chemolab.2024.105310","DOIUrl":"10.1016/j.chemolab.2024.105310","url":null,"abstract":"<div><div>This study presents a novel approach for pH estimation in buffer solutions using images of solutions prepared with <em>Hibiscus sabdariffa</em> L. as a natural pH indicator. The images of the solutions, each displaying distinctive colours indicative of their pH levels, were transformed into standardized 200x200-pixel images through the application of image processing techniques. Following this, a pH prediction model was constructed using the Adaptive Boosting regressor algorithm. The pH values of the training data used when training the model were distributed irregularly between 0–14. The models were trained with 94 pictures and 1880 experimental values. In addition, a reliable pre-processing part has been placed into the model using image processing techniques, allowing test data to be obtained in any desired environment. The obtained training and test data were separated from noise parameters, affecting the prediction results negatively. A smartphone application based on the model has been developed and made available to everyone. This innovative methodology bridges the gap between traditional pH measurement techniques and computer vision, offering a more accessible and eco-friendly means of pH assessment. The practical applications of this research extend to various fields, including environmental monitoring, agriculture, and educational settings.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"257 ","pages":"Article 105310"},"PeriodicalIF":3.7,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143156204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UV–Vis spectralprint-based discrimination and quantification of sugar syrup adulteration in honey using the Successive Projections Algorithm (SPA) for variable selection
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-12-26 DOI: 10.1016/j.chemolab.2024.105314
Luana Leal de Souza , Dâmaris Naara Chaves Candeias , Edilene Dantas Telles Moreira , Paulo Henrique Gonçalves Dias Diniz , Valeria Haydée Springer , David Douglas de Sousa Fernandes
This work developed, for the first time, an improved analytical strategy for discriminating and quantifying honey adulteration by adding corn and agave syrups using the Successive Projections Algorithm (SPA) for variable selection in UV–Vis spectral analysis. Sample preparation involved dilution in water alone for obtaining the spectralprint data. By applying the first derivative Savitzky-Golay smoothing to spectra and interval selection by SPA, the iSPA-PLS-DA algorithm (Partial Least Squares - Discriminant Analysis) correctly classified all test samples (i.e., 100 % sensitivity, specificity, and accuracy) selecting 4 out of 15 intervals. Additionally, the quantification of adulteration honey using the iSPA-PLS algorithm achieved the lowest relative error of prediction (REP) and limit of detection (LOD) values of only 5.89 % and 7.02 mg g−1, respectively, selecting 10 out of 20 intervals. The proposed method aligns with White and Green Analytical Chemistry principles, being simple, quick, affordable, and eco-friendly. It also aids in developing future protocols and legislation for honey quality.
{"title":"UV–Vis spectralprint-based discrimination and quantification of sugar syrup adulteration in honey using the Successive Projections Algorithm (SPA) for variable selection","authors":"Luana Leal de Souza ,&nbsp;Dâmaris Naara Chaves Candeias ,&nbsp;Edilene Dantas Telles Moreira ,&nbsp;Paulo Henrique Gonçalves Dias Diniz ,&nbsp;Valeria Haydée Springer ,&nbsp;David Douglas de Sousa Fernandes","doi":"10.1016/j.chemolab.2024.105314","DOIUrl":"10.1016/j.chemolab.2024.105314","url":null,"abstract":"<div><div>This work developed, for the first time, an improved analytical strategy for discriminating and quantifying honey adulteration by adding corn and agave syrups using the Successive Projections Algorithm (SPA) for variable selection in UV–Vis spectral analysis. Sample preparation involved dilution in water alone for obtaining the spectralprint data. By applying the first derivative Savitzky-Golay smoothing to spectra and interval selection by SPA, the iSPA-PLS-DA algorithm (Partial Least Squares - Discriminant Analysis) correctly classified all test samples (i.e., 100 % sensitivity, specificity, and accuracy) selecting 4 out of 15 intervals. Additionally, the quantification of adulteration honey using the iSPA-PLS algorithm achieved the lowest relative error of prediction (REP) and limit of detection (LOD) values of only 5.89 % and 7.02 mg g<sup>−1</sup>, respectively, selecting 10 out of 20 intervals. The proposed method aligns with White and Green Analytical Chemistry principles, being simple, quick, affordable, and eco-friendly. It also aids in developing future protocols and legislation for honey quality.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"257 ","pages":"Article 105314"},"PeriodicalIF":3.7,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143155807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical estimation of mean Lorentzian line width in spectra by Gaussian processes
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-12-24 DOI: 10.1016/j.chemolab.2024.105307
Erik Kuitunen , Matthew T. Moores , Teemu Härkönen
We propose a statistical approach for estimating the mean line width in spectra comprising Lorentzian, Gaussian, or Voigt line shapes. Our approach uses Gaussian processes in two stages to jointly model a spectrum and its Fourier transform. We generate statistical samples for the mean line width by drawing realizations for the Fourier transform and its derivative using Markov chain Monte Carlo methods. In addition to being fully automated, our method enables well-calibrated uncertainty quantification of the mean line width estimate through Bayesian inference. We validate our method using a simulation study and apply it to an experimental Raman spectrum of β-carotene.
{"title":"Statistical estimation of mean Lorentzian line width in spectra by Gaussian processes","authors":"Erik Kuitunen ,&nbsp;Matthew T. Moores ,&nbsp;Teemu Härkönen","doi":"10.1016/j.chemolab.2024.105307","DOIUrl":"10.1016/j.chemolab.2024.105307","url":null,"abstract":"<div><div>We propose a statistical approach for estimating the mean line width in spectra comprising Lorentzian, Gaussian, or Voigt line shapes. Our approach uses Gaussian processes in two stages to jointly model a spectrum and its Fourier transform. We generate statistical samples for the mean line width by drawing realizations for the Fourier transform and its derivative using Markov chain Monte Carlo methods. In addition to being fully automated, our method enables well-calibrated uncertainty quantification of the mean line width estimate through Bayesian inference. We validate our method using a simulation study and apply it to an experimental Raman spectrum of <span><math><mi>β</mi></math></span>-carotene.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"257 ","pages":"Article 105307"},"PeriodicalIF":3.7,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143155808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DD-SIMCA as an alternative tool to assess the short-term stability of a marine sediment reference material candidate
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-12-22 DOI: 10.1016/j.chemolab.2024.105312
Clícia A. Gomes , Carlos José M. da Silva , Maria Tereza W.D. Carneiro , Jefferson R. de Souza , Cibele Maria S. de Almeida
The stability test is essential in the production of a reference material (RM). They can be classified as short-term (transport conditions) and long-term (shelf time). The evaluation of the stability test is carried out using the regression method, as indicated by ISO Guide 35. However, some studies have highlighted the use of multivariate methods in the evaluation of tests performed in RM production. Therefore, this work presents the data driven soft independent modeling class analogy (DD-SIMCA) method as a viable alternative for evaluating data from the short-term stability test of a candidate reference material for metal determination with marine sediment matrix. The test was performed isochronously for one month at a temperature of 60 °C. The samples were decomposed (in triplicate) by the EPA 3051 A method and analyzed by inductively coupled plasma mass spectrometry (ICP-MS) and inductively coupled plasma optical emission spectrometry (ICP OES). Samples mass fractions stored at standard temperature (−20 °C) were (mg kg−1): 70.4 ± 4.5 for Ba, 12.0 ± 0.7 for Co, 17.9 ± 1.0 for Cu, 60.7 ± 3.3 for Zn, 49137 ± 4790 for Al, and 60021 ± 3090 for Fe. These values were compared with the mass fractions of samples subjected to the test condition (60 °C) for four weeks, which were (mg kg-1): 70.0 ± 4.0 for Ba, 12.1 ± 0.5 for Co, 17.4 ± 0.8 for Cu, 60.6 ± 2.9 for Zn, 48388 ± 3424 for Al, and 58049 ± 1886 for Fe. A comparison was made between the mass fractions from the standard and test conditions by the regression method. The model applied in the DD-SIMCA method was constructed using two principal components, an alpha value and confidence interval of 0.05, and the instrumental quintuplicates of the samples stored at −20 °C. The samples subjected to 60 °C fit the constructed model, indicating that there was no significant difference between the properties of these samples and those that were maintained in reference temperature. The RM candidate was considered stable at a temperature of 60 °C for a period of one month, both by the regression method and by the DD-SIMCA method. The multivariate method DD-SIMCA was considered a possible alternative and confirmatory tool in evaluating the results of testing short-term stability realized during RM production.
{"title":"DD-SIMCA as an alternative tool to assess the short-term stability of a marine sediment reference material candidate","authors":"Clícia A. Gomes ,&nbsp;Carlos José M. da Silva ,&nbsp;Maria Tereza W.D. Carneiro ,&nbsp;Jefferson R. de Souza ,&nbsp;Cibele Maria S. de Almeida","doi":"10.1016/j.chemolab.2024.105312","DOIUrl":"10.1016/j.chemolab.2024.105312","url":null,"abstract":"<div><div>The stability test is essential in the production of a reference material (RM). They can be classified as short-term (transport conditions) and long-term (shelf time). The evaluation of the stability test is carried out using the regression method, as indicated by ISO Guide 35. However, some studies have highlighted the use of multivariate methods in the evaluation of tests performed in RM production. Therefore, this work presents the data driven soft independent modeling class analogy (DD-SIMCA) method as a viable alternative for evaluating data from the short-term stability test of a candidate reference material for metal determination with marine sediment matrix. The test was performed isochronously for one month at a temperature of 60 °C. The samples were decomposed (in triplicate) by the EPA 3051 A method and analyzed by inductively coupled plasma mass spectrometry (ICP-MS) and inductively coupled plasma optical emission spectrometry (ICP OES). Samples mass fractions stored at standard temperature (−20 °C) were (mg kg<sup>−1</sup>): 70.4 ± 4.5 for Ba, 12.0 ± 0.7 for Co, 17.9 ± 1.0 for Cu, 60.7 ± 3.3 for Zn, 49137 ± 4790 for Al, and 60021 ± 3090 for Fe. These values were compared with the mass fractions of samples subjected to the test condition (60 °C) for four weeks, which were (mg kg-1): 70.0 ± 4.0 for Ba, 12.1 ± 0.5 for Co, 17.4 ± 0.8 for Cu, 60.6 ± 2.9 for Zn, 48388 ± 3424 for Al, and 58049 ± 1886 for Fe. A comparison was made between the mass fractions from the standard and test conditions by the regression method. The model applied in the DD-SIMCA method was constructed using two principal components, an alpha value and confidence interval of 0.05, and the instrumental quintuplicates of the samples stored at −20 °C. The samples subjected to 60 °C fit the constructed model, indicating that there was no significant difference between the properties of these samples and those that were maintained in reference temperature. The RM candidate was considered stable at a temperature of 60 °C for a period of one month, both by the regression method and by the DD-SIMCA method. The multivariate method DD-SIMCA was considered a possible alternative and confirmatory tool in evaluating the results of testing short-term stability realized during RM production.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"257 ","pages":"Article 105312"},"PeriodicalIF":3.7,"publicationDate":"2024-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143156205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing robust prediction models without test datasets: A causal discovery approach on near-infrared spectra
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-12-21 DOI: 10.1016/j.chemolab.2024.105313
Minh-Quan Nguyen , Mizuki Tsuta , Mito Kokawa
Machine learning prediction models calibrated with spectral data use correlations between variables without considering causation. The absence of genuine cause–effect relations hinders the ability to ensure methodical prediction reproducibility. Therefore, tools supporting causal-based discovery are essential in spectroscopy and chemometrics to enhance robustness. Accordingly, this study invokes causal inference theory to establish the causal discovery index (CDI) to distinguish datasets with reliable causal structures from those prone to spurious correlations. This framework was applied to seven simulated near-infrared spectral causal structures. Simulated near-infrared spectra were utilized to ensure that the framework performance was optimized and verified appropriately in a generalized methodology. Reliable structures were confirmed to be differentiated by the differences in the mean and standard deviation of bootstrapped CDI indices. Distinctive thresholds for the mean and standard deviation were established at the sample size of 1000 and 10,000. The framework consistently performed well with multiple spectral preprocessing methods such as derivation and dimension reduction. It was also robust with variations, surpassing the conventional test-set validation method without the use of additional independent datasets. This would benefit the applicability of the novel framework in practical situations where dataset collection can be limited. Moreover, it can be extended to various sensor-based data, encompassing only seven possible causal structures.
{"title":"Assessing robust prediction models without test datasets: A causal discovery approach on near-infrared spectra","authors":"Minh-Quan Nguyen ,&nbsp;Mizuki Tsuta ,&nbsp;Mito Kokawa","doi":"10.1016/j.chemolab.2024.105313","DOIUrl":"10.1016/j.chemolab.2024.105313","url":null,"abstract":"<div><div>Machine learning prediction models calibrated with spectral data use correlations between variables without considering causation. The absence of genuine cause–effect relations hinders the ability to ensure methodical prediction reproducibility. Therefore, tools supporting causal-based discovery are essential in spectroscopy and chemometrics to enhance robustness. Accordingly, this study invokes causal inference theory to establish the causal discovery index (CDI) to distinguish datasets with reliable causal structures from those prone to spurious correlations. This framework was applied to seven simulated near-infrared spectral causal structures. Simulated near-infrared spectra were utilized to ensure that the framework performance was optimized and verified appropriately in a generalized methodology. Reliable structures were confirmed to be differentiated by the differences in the mean and standard deviation of bootstrapped CDI indices. Distinctive thresholds for the mean and standard deviation were established at the sample size of 1000 and 10,000. The framework consistently performed well with multiple spectral preprocessing methods such as derivation and dimension reduction. It was also robust with variations, surpassing the conventional test-set validation method without the use of additional independent datasets. This would benefit the applicability of the novel framework in practical situations where dataset collection can be limited. Moreover, it can be extended to various sensor-based data, encompassing only seven possible causal structures.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"257 ","pages":"Article 105313"},"PeriodicalIF":3.7,"publicationDate":"2024-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143156195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multiplatform spectralprint strategies for the authentication of Spanish PDO fortified wines using AHIMBU, an automatic hierarchical classification tool
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-12-20 DOI: 10.1016/j.chemolab.2024.105311
Rocío Ríos-Reina , M. Pilar Segura-Borrego , Jose M. Camiña , Raquel M. Callejón , Silvana M. Azcarate
Spanish fortified wines with Protected Designation of Origin (PDO) are esteemed for their deep-rooted tradition, historical significance, and exceptional viticultural quality. Spain boasts four PDOs: ‘Condado de Huelva’, ‘Jerez-Xérès-Sherry’, ‘Sanlúcar de Barrameda’, and ‘Montilla-Moriles', which produce different types of wines—Fino and Manzanilla undergo biological aging, Olorosos experience oxidative aging, and Amontillados benefit from mixed aging. Due to their long aging periods and significant production costs and hence, their high value, these wines are susceptible to fraud, emphasizing the necessity for robust authentication methods. In response to this need, this study explores emerging technologies, such as spectroscopic techniques coupled with different chemometric approaches, to offer rapid, straightforward, and cost-effective solutions to ensure the authenticity of PDO wines. A comprehensive set of PDO fortified wines, encompassing various types and origins, was analyzed by near and mid-infrared (NIR and MIR) and ultraviolet–visible (UV–Vis) spectroscopies. Preprocessed data were modelled individually, as well as after low-level data fusion, using partial least squares-discriminant analysis (PLS-DA) and a new available chemometric tool named Automatic Hierarchical Model Builder (AHIMBU). The results obtained showed that the hierarchical classification model generated by AHIMBU outperformed the single PLS-DA models, offering enhanced classification accuracy and efficiency (i.e., the correct classification rate increased by around 40 % from the single PLS-DA models to the AHIMBU models). Among the spectroscopic techniques applied, UV–Vis spectroscopy emerged as the most effective for authentication purposes.
{"title":"Multiplatform spectralprint strategies for the authentication of Spanish PDO fortified wines using AHIMBU, an automatic hierarchical classification tool","authors":"Rocío Ríos-Reina ,&nbsp;M. Pilar Segura-Borrego ,&nbsp;Jose M. Camiña ,&nbsp;Raquel M. Callejón ,&nbsp;Silvana M. Azcarate","doi":"10.1016/j.chemolab.2024.105311","DOIUrl":"10.1016/j.chemolab.2024.105311","url":null,"abstract":"<div><div>Spanish fortified wines with Protected Designation of Origin (PDO) are esteemed for their deep-rooted tradition, historical significance, and exceptional viticultural quality. Spain boasts four PDOs: ‘Condado de Huelva’, ‘Jerez-Xérès-Sherry’, ‘Sanlúcar de Barrameda’, and ‘Montilla-Moriles', which produce different types of wines—Fino and Manzanilla undergo biological aging, Olorosos experience oxidative aging, and Amontillados benefit from mixed aging. Due to their long aging periods and significant production costs and hence, their high value, these wines are susceptible to fraud, emphasizing the necessity for robust authentication methods. In response to this need, this study explores emerging technologies, such as spectroscopic techniques coupled with different chemometric approaches, to offer rapid, straightforward, and cost-effective solutions to ensure the authenticity of PDO wines. A comprehensive set of PDO fortified wines, encompassing various types and origins, was analyzed by near and mid-infrared (NIR and MIR) and ultraviolet–visible (UV–Vis) spectroscopies. Preprocessed data were modelled individually, as well as after low-level data fusion, using partial least squares-discriminant analysis (PLS-DA) and a new available chemometric tool named Automatic Hierarchical Model Builder (AHIMBU). The results obtained showed that the hierarchical classification model generated by AHIMBU outperformed the single PLS-DA models, offering enhanced classification accuracy and efficiency (i.e., the correct classification rate increased by around 40 % from the single PLS-DA models to the AHIMBU models). Among the spectroscopic techniques applied, UV–Vis spectroscopy emerged as the most effective for authentication purposes.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"257 ","pages":"Article 105311"},"PeriodicalIF":3.7,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143156196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Utilization of artificial intelligence for evaluation of targeted cancer therapy via drug nanoparticles to estimate delivery efficiency to various sites
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-12-17 DOI: 10.1016/j.chemolab.2024.105309
Wael A. Mahdi , Adel Alhowyan , Ahmad J. Obaidullah
Poor delivery efficiency of drug nanoparticles to tumor sites in targeted cancer therapy is a major issue towards developing this technique. The type of drug nanocarrier, its shape, size, materials. and physicochemical properties play important roles on the delivery efficiency which should be well understood. This study presents a machine learning approach to predict the delivery efficiency of nanoparticles across various organs for targeted cancer therapy via nanoparticles. The focus was made on three advanced regression models: Gaussian Process Regression (GPR), Extra Trees (ET) regression, and Local Polynomial Regression (LPR). The integration of these models into the analysis of a complex biomedical dataset—comprising 534 records of nanoparticle properties and their distribution across organs such as the tumor, heart, liver, spleen, lung, and kidney—demonstrates their potential in enhancing predictive accuracy in chemical and biological processes. GPR, a non-parametric probabilistic model, was selected for its robustness in handling small, intricate datasets with complex nonlinear relationships, offering precise uncertainty quantification. ET regression, an ensemble learning method, was chosen for its resilience against overfitting in high-dimensional data, thanks to its unique approach of constructing multiple unpruned decision trees with randomized splits. LPR was included for its ability to capture local trends in data, providing nuanced predictions without assuming a global parametric form. The dataset underwent rigorous preprocessing, including missing data imputation using the Multivariate Imputation by Chained Equations (MICE) method, outlier detection through Subspace Outlier Detection (SOD), and feature selection using Conditional Mutual Information (CMI). Z-score normalization was applied to standardize the features, aligning them with the Gaussian assumptions of GPR and improving the overall performance of the models. The models were optimized using the Whale Optimization Algorithm (WOA) to maximize predictive accuracy, with GPR and ET models showing significant improvements over baseline models in predicting the biodistribution outcomes.
{"title":"Utilization of artificial intelligence for evaluation of targeted cancer therapy via drug nanoparticles to estimate delivery efficiency to various sites","authors":"Wael A. Mahdi ,&nbsp;Adel Alhowyan ,&nbsp;Ahmad J. Obaidullah","doi":"10.1016/j.chemolab.2024.105309","DOIUrl":"10.1016/j.chemolab.2024.105309","url":null,"abstract":"<div><div>Poor delivery efficiency of drug nanoparticles to tumor sites in targeted cancer therapy is a major issue towards developing this technique. The type of drug nanocarrier, its shape, size, materials. and physicochemical properties play important roles on the delivery efficiency which should be well understood. This study presents a machine learning approach to predict the delivery efficiency of nanoparticles across various organs for targeted cancer therapy via nanoparticles. The focus was made on three advanced regression models: Gaussian Process Regression (GPR), Extra Trees (ET) regression, and Local Polynomial Regression (LPR). The integration of these models into the analysis of a complex biomedical dataset—comprising 534 records of nanoparticle properties and their distribution across organs such as the tumor, heart, liver, spleen, lung, and kidney—demonstrates their potential in enhancing predictive accuracy in chemical and biological processes. GPR, a non-parametric probabilistic model, was selected for its robustness in handling small, intricate datasets with complex nonlinear relationships, offering precise uncertainty quantification. ET regression, an ensemble learning method, was chosen for its resilience against overfitting in high-dimensional data, thanks to its unique approach of constructing multiple unpruned decision trees with randomized splits. LPR was included for its ability to capture local trends in data, providing nuanced predictions without assuming a global parametric form. The dataset underwent rigorous preprocessing, including missing data imputation using the Multivariate Imputation by Chained Equations (MICE) method, outlier detection through Subspace Outlier Detection (SOD), and feature selection using Conditional Mutual Information (CMI). Z-score normalization was applied to standardize the features, aligning them with the Gaussian assumptions of GPR and improving the overall performance of the models. The models were optimized using the Whale Optimization Algorithm (WOA) to maximize predictive accuracy, with GPR and ET models showing significant improvements over baseline models in predicting the biodistribution outcomes.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"257 ","pages":"Article 105309"},"PeriodicalIF":3.7,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143155127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Chemometrics and Intelligent Laboratory Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1