Pub Date : 2024-07-03DOI: 10.1016/j.chemolab.2024.105170
Mia Hubert, Mehdi Hirari
Multiway data extend two-way matrices into higher-dimensional tensors, often explored through dimensional reduction techniques. In this paper, we study the Parallel Factor Analysis (PARAFAC) model for handling multiway data, representing it more compactly through a concise set of loading matrices and scores. We assume that the data may be incomplete and could contain both rowwise and cellwise outliers, signifying cases that deviate from the majority and outlying cells dispersed throughout the data array. To address these challenges, we present a novel algorithm designed to robustly estimate both loadings and scores. Additionally, we introduce an enhanced outlier map to distinguish various patterns of outlying behavior. Through simulations and the analysis of fluorescence Excitation-Emission Matrix (EEM) data, we demonstrate the robustness of our approach. Our results underscore the effectiveness of diagnostic tools in identifying and interpreting unusual patterns within the data.
{"title":"MacroPARAFAC for handling rowwise and cellwise outliers in incomplete multiway data","authors":"Mia Hubert, Mehdi Hirari","doi":"10.1016/j.chemolab.2024.105170","DOIUrl":"10.1016/j.chemolab.2024.105170","url":null,"abstract":"<div><p>Multiway data extend two-way matrices into higher-dimensional tensors, often explored through dimensional reduction techniques. In this paper, we study the Parallel Factor Analysis (PARAFAC) model for handling multiway data, representing it more compactly through a concise set of loading matrices and scores. We assume that the data may be incomplete and could contain both rowwise and cellwise outliers, signifying cases that deviate from the majority and outlying cells dispersed throughout the data array. To address these challenges, we present a novel algorithm designed to robustly estimate both loadings and scores. Additionally, we introduce an enhanced outlier map to distinguish various patterns of outlying behavior. Through simulations and the analysis of fluorescence Excitation-Emission Matrix (EEM) data, we demonstrate the robustness of our approach. Our results underscore the effectiveness of diagnostic tools in identifying and interpreting unusual patterns within the data.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141566715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1016/j.chemolab.2024.105173
Xueping Yang , Fuyu Yang , Matthieu Lesnoff , Paolo Berzaghi , Alessandro Ferragina
This study aimed to assess the predictive accuracy of Near-Infrared Spectroscopy (NIRS) across a large multi-product library, employing novel local calibration methodologies. Three local strategies were examined: LOCAL Algorithm, Locally Weighted Regression predicted on k-nearest neighbor selection (kNN-LWPLSR), along with a newly proposed algorithm within this study called Hybrid Local. These strategies were applied to an extensive multi-product dataset. When compared with Global PLS models, the results exhibited significant reductions in RMSEP values for all local strategies. Particularly, the kNN-LWPLSR demonstrated proficient prediction for the constituents of ADF and DM. The newly proposed method [Hybrid Local] exhibits comparable performance to the LOCAL Algorithm; however, it notably reduces the prediction time by half compared to the latter, representing a significant advancement for the practical implementation of NIRS technology within industrial processing scenarios.
本研究旨在采用新颖的局部校准方法,评估近红外光谱(NIRS)在大型多产品库中的预测准确性。研究考察了三种局部策略:LOCAL 算法、基于 k 近邻选择的局部加权回归预测 (kNN-LWPLSR) 以及本研究中新提出的混合局部算法。这些策略被应用于一个广泛的多产品数据集。与全局 PLS 模型相比,所有本地策略的 RMSEP 值都有显著降低。特别是,kNN-LWPLSR 对 ADF 和 DM 的成分进行了出色的预测。新提出的[混合本地]方法与 LOCAL 算法的性能相当,但与后者相比,它明显缩短了一半的预测时间,这对于在工业加工场景中实际应用近红外光谱技术来说是一个重大进步。
{"title":"Diverse local calibration approaches for chemometric predictive analysis of large near-infrared spectroscopy (NIRS) multi-product datasets","authors":"Xueping Yang , Fuyu Yang , Matthieu Lesnoff , Paolo Berzaghi , Alessandro Ferragina","doi":"10.1016/j.chemolab.2024.105173","DOIUrl":"https://doi.org/10.1016/j.chemolab.2024.105173","url":null,"abstract":"<div><p>This study aimed to assess the predictive accuracy of Near-Infrared Spectroscopy (NIRS) across a large multi-product library, employing novel local calibration methodologies. Three local strategies were examined: LOCAL Algorithm, Locally Weighted Regression predicted on k-nearest neighbor selection (kNN-LWPLSR), along with a newly proposed algorithm within this study called Hybrid Local. These strategies were applied to an extensive multi-product dataset. When compared with Global PLS models, the results exhibited significant reductions in RMSEP values for all local strategies. Particularly, the kNN-LWPLSR demonstrated proficient prediction for the constituents of ADF and DM. The newly proposed method [Hybrid Local] exhibits comparable performance to the LOCAL Algorithm; however, it notably reduces the prediction time by half compared to the latter, representing a significant advancement for the practical implementation of NIRS technology within industrial processing scenarios.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169743924001138/pdfft?md5=115b1d8cf3d3927fcd4a4da98b29f3e1&pid=1-s2.0-S0169743924001138-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141539174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-26DOI: 10.1016/j.chemolab.2024.105171
C. Ortiz-Abellán , E. Aguado-Sarrió , J.M. Prats-Montalbán , J. Camps-Herrero , A. Ferrer
Currently, magnetic resonance imaging is the most sensitive imaging technique for detecting cancerous processes in early stages. As for breast cancer, due to the tubular structure of the tissue, being formed by ducts, anisotropic diffusion should be considered instead of the general isotropic diffusion. Anisotropic diffusion is studied by applying a technique called Diffusion Tensor Imaging (DTI), where the diffusion gradient is applied by changing the magnetic field in several spatial directions.
To date, the application of Multivariate Curve Resolution (MCR) models in diffusion sequences has demonstrated its ability to develop cancer biomarkers of easy clinical interpretation in the case of isotropic tissues, such as the prostate. But so far, it has never been applied in the case of anisotropic tissues, as the breast.
Therefore, the main objective of this work is to obtain easy-to-interpret imaging biomarkers useful for early breast cancer diagnosis from diffusion magnetic resonance imaging based on the Diffusion Tensor using multivariate curve resolution (MCR) models. A classification model to identify healthy and tumor affected pixels is also proposed.
{"title":"New breast cancer biomarkers from diffusion magnetic resonance imaging based on the Diffusion Tensor using multivariate curve resolution (MCR) models","authors":"C. Ortiz-Abellán , E. Aguado-Sarrió , J.M. Prats-Montalbán , J. Camps-Herrero , A. Ferrer","doi":"10.1016/j.chemolab.2024.105171","DOIUrl":"https://doi.org/10.1016/j.chemolab.2024.105171","url":null,"abstract":"<div><p>Currently, magnetic resonance imaging is the most sensitive imaging technique for detecting cancerous processes in early stages. As for breast cancer, due to the tubular structure of the tissue, being formed by ducts, anisotropic diffusion should be considered instead of the general isotropic diffusion. Anisotropic diffusion is studied by applying a technique called Diffusion Tensor Imaging (DTI), where the diffusion gradient is applied by changing the magnetic field in several spatial directions.</p><p>To date, the application of Multivariate Curve Resolution (MCR) models in diffusion sequences has demonstrated its ability to develop cancer biomarkers of easy clinical interpretation in the case of isotropic tissues, such as the prostate. But so far, it has never been applied in the case of anisotropic tissues, as the breast.</p><p>Therefore, the main objective of this work is to obtain easy-to-interpret imaging biomarkers useful for early breast cancer diagnosis from diffusion magnetic resonance imaging based on the Diffusion Tensor using multivariate curve resolution (MCR) models. A classification model to identify healthy and tumor affected pixels is also proposed.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169743924001114/pdfft?md5=bfa9e402dd60fbdcd42e8d99cb32d250&pid=1-s2.0-S0169743924001114-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141606715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-26DOI: 10.1016/j.chemolab.2024.105172
Miguel Mengual-Pujante , Antonio J. Perán , Antonio Ortiz , María Dolores Pérez-Cárceles
Blood in the form of stains is one of the most frequently encountered fluid in crime scene. Estimation of the time since deposition (TSD) is of great importance to guide the police investigation and the clarification of criminal offences. The time elapsed since deposition is usually estimated by modelling the physicochemical degradation of blood biomolecules over time. This work shows an ATR-FTIR spectroscopy and chemometrics study to estimate TSD of bloodstains on various surfaces and under different ambient conditions (indoor and outdoor). For a period from 0 to 212 days, a total of 960 stains were analyzed. Most of the eleven partial least squares regression (PLSR) models obtained showed a good prediction capacity, with a Residual Predictive Deviation (RPD) value higher than 3, and R2 higher than 0.90. Models for non-rigid supports showed better predictive capacity than those for rigid ones. A non-rigid surface model including the various non-rigid surfaces and ambient conditions was elaborated, which might be the most useful one from the criminalistic point of view. These results show that this technique can be a rapid, robust, and trustable tool for in situ determination of the TSD of bloodstains at crime scenes.
{"title":"Estimation of human bloodstains time since deposition using ATR-FTIR spectroscopy and chemometrics in simulated crime conditions","authors":"Miguel Mengual-Pujante , Antonio J. Perán , Antonio Ortiz , María Dolores Pérez-Cárceles","doi":"10.1016/j.chemolab.2024.105172","DOIUrl":"https://doi.org/10.1016/j.chemolab.2024.105172","url":null,"abstract":"<div><p>Blood in the form of stains is one of the most frequently encountered fluid in crime scene. Estimation of the time since deposition (TSD) is of great importance to guide the police investigation and the clarification of criminal offences. The time elapsed since deposition is usually estimated by modelling the physicochemical degradation of blood biomolecules over time. This work shows an ATR-FTIR spectroscopy and chemometrics study to estimate TSD of bloodstains on various surfaces and under different ambient conditions (indoor and outdoor). For a period from 0 to 212 days, a total of 960 stains were analyzed. Most of the eleven partial least squares regression (PLSR) models obtained showed a good prediction capacity, with a Residual Predictive Deviation (RPD) value higher than 3, and R<sup>2</sup> higher than 0.90. Models for non-rigid supports showed better predictive capacity than those for rigid ones. A non-rigid surface model including the various non-rigid surfaces and ambient conditions was elaborated, which might be the most useful one from the criminalistic point of view. These results show that this technique can be a rapid, robust, and trustable tool for <em>in situ</em> determination of the TSD of bloodstains at crime scenes.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169743924001126/pdfft?md5=12868d33bb0a44826b6ab904bb81dcbd&pid=1-s2.0-S0169743924001126-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141487277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-21DOI: 10.1016/j.chemolab.2024.105168
Darja Cvetković, Marija Mitrović Dankulov, Aleksandar Bogojević, Saša Lazović, Darija Obradović
The fast and accurate prediction of Hansen solubility benefits many diverse fields such as pharmaceuticals, the food industry, and cosmetics. To estimate the individual HSP values (polar, dispersive, and hydrogen bonding components), we investigated the performance of using Mordred descriptors in multiple linear regressions and XGBoost modeling. For HSP predictions, we also tested a graph-based molecular representation with graph neural network (GNN) modeling. To select the optimal models for final training and predictions, we used nested cross-validation and hyper-parameter optimization. The models with the best predictive performance were selected through internal (R2train, RMSE, MEPcv) and external (RMSEP, CCC, MEP, R2test, ar2m, Δr2m) validation metrics using ∼1200 compounds from free-available database https://www.stevenabbott.co.uk. To confirm the practical reliability, we examined the agreement of experimentally obtained HSP data from the literature for 93 compounds and the data predicted by the created models. The results of GNN modeling showed the best predictive characteristics, which include a coefficient of determination between experimentally obtained and predicted HSP values greater than 0.76 for polar and hydrogen bond forces and greater than 0.66 for dispersive forces. Interpreting the fundamental basis of Hansen solubility using the created MLR equations and XGBoost models, HSP values were found to be influenced by van der Waals volume characteristics, 2D matrix molecular representation, and polarity. We elaborated on the practical benefits of using the selected GNN method through Hansen's solubility sphere as an example. This is the first study to demonstrate the advantages of GNN in predicting individual HSP components, as well as the first study to describe in detail their molecular basis using MLR and XGBoost modeling.
{"title":"Enhancing Hansen Solubility Predictions with Molecular and Graph-Based Approaches","authors":"Darja Cvetković, Marija Mitrović Dankulov, Aleksandar Bogojević, Saša Lazović, Darija Obradović","doi":"10.1016/j.chemolab.2024.105168","DOIUrl":"https://doi.org/10.1016/j.chemolab.2024.105168","url":null,"abstract":"<div><p>The fast and accurate prediction of Hansen solubility benefits many diverse fields such as pharmaceuticals, the food industry, and cosmetics. To estimate the individual HSP values (polar, dispersive, and hydrogen bonding components), we investigated the performance of using Mordred descriptors in multiple linear regressions and XGBoost modeling. For HSP predictions, we also tested a graph-based molecular representation with graph neural network (GNN) modeling. To select the optimal models for final training and predictions, we used nested cross-validation and hyper-parameter optimization. The models with the best predictive performance were selected through internal (<em>R</em><sup><em>2</em></sup><sub>train</sub>, RMSE, MEPcv) and external (RMSEP, CCC, MEP, <em>R</em><sup><em>2</em></sup><sub>test</sub>, <em>ar</em><sup>2</sup>m, Δ<em>r</em><sup>2</sup>m) validation metrics using ∼1200 compounds from free-available database <span>https://www.stevenabbott.co.uk</span><svg><path></path></svg>. To confirm the practical reliability, we examined the agreement of experimentally obtained HSP data from the literature for 93 compounds and the data predicted by the created models. The results of GNN modeling showed the best predictive characteristics, which include a coefficient of determination between experimentally obtained and predicted HSP values greater than 0.76 for polar and hydrogen bond forces and greater than 0.66 for dispersive forces. Interpreting the fundamental basis of Hansen solubility using the created MLR equations and XGBoost models, HSP values were found to be influenced by van der Waals volume characteristics, 2D matrix molecular representation, and polarity. We elaborated on the practical benefits of using the selected GNN method through Hansen's solubility sphere as an example. This is the first study to demonstrate the advantages of GNN in predicting individual HSP components, as well as the first study to describe in detail their molecular basis using MLR and XGBoost modeling.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141487276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-20DOI: 10.1016/j.chemolab.2024.105169
Michaela Chocholoušková , Gabriel Vivó-Truyols , Denise Wolrab , Robert Jirásko , Michela Antonelli , Ondřej Peterka , Zuzana Vaňková , Michal Holčapek
LipidQuant 2.1 is a software written in Matlab, which is designed for the high-throughput processing of large lipidomic data sets measured by lipid class separation coupled with quadrupole time-of-flight (QTOF) high-resolution mass spectrometry (MS). The software enables the identification of lipid species based on defined mass accuracy. The main focus is on the right lipidomic quantitation using at least one internal standard per lipid class and the implementation of an automated procedure for Type I and Type II isotopic corrections necessary for the determination of accurate molar concentrations, which is not available for the majority of existing software solutions. LipidQuant 2.1 offers three options for peak assignment, visualization of the isotopic pattern, and automated calculation of m/z for various adduct ions. The initial lipidomic database covers 31 lipid classes with more than 2900 lipid species that occur primarily in the human lipidome, but users have the full flexibility to modify and extend the database according to their needs. All algorithms and the detailed user manual are provided. The reliability of LipidQuant 2.1 is demonstrated on a set of more than 250 biological samples measured by ultrahigh-performance supercritical liquid chromatography (UHPSFC) coupled with QTOF-MS.
{"title":"Lipid Quant 2.1: Open-source software for identification and quantification of lipids measured by lipid class separation QTOF high-resolution mass spectrometry methods","authors":"Michaela Chocholoušková , Gabriel Vivó-Truyols , Denise Wolrab , Robert Jirásko , Michela Antonelli , Ondřej Peterka , Zuzana Vaňková , Michal Holčapek","doi":"10.1016/j.chemolab.2024.105169","DOIUrl":"https://doi.org/10.1016/j.chemolab.2024.105169","url":null,"abstract":"<div><p>LipidQuant 2.1 is a software written in Matlab, which is designed for the high-throughput processing of large lipidomic data sets measured by lipid class separation coupled with quadrupole time-of-flight (QTOF) high-resolution mass spectrometry (MS). The software enables the identification of lipid species based on defined mass accuracy. The main focus is on the right lipidomic quantitation using at least one internal standard per lipid class and the implementation of an automated procedure for Type I and Type II isotopic corrections necessary for the determination of accurate molar concentrations, which is not available for the majority of existing software solutions. LipidQuant 2.1 offers three options for peak assignment, visualization of the isotopic pattern, and automated calculation of <em>m/z</em> for various adduct ions. The initial lipidomic database covers 31 lipid classes with more than 2900 lipid species that occur primarily in the human lipidome, but users have the full flexibility to modify and extend the database according to their needs. All algorithms and the detailed user manual are provided. The reliability of LipidQuant 2.1 is demonstrated on a set of more than 250 biological samples measured by ultrahigh-performance supercritical liquid chromatography (UHPSFC) coupled with QTOF-MS.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169743924001096/pdfft?md5=9ea2187d616236fadca4f84096ec1816&pid=1-s2.0-S0169743924001096-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141487275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-18DOI: 10.1016/j.chemolab.2024.105166
M.S. Sánchez , M.C. Ortiz , S. Ruiz , O. Valencia , L.A. Sarabia
The paper deals with the inversion of intervals when a PLS (Partial Least Squares) model is used. However, instead of discretizing the interval, it is proved that the region resulting from the inversion of a PLS model is a convex set bounded by two parallel hyperplanes, each corresponding to the direct inversion of each endpoint of the given interval.
When the domain of the input variables is a convex set, any feasible solution with predictions within the interval set in the response can be obtained as a convex combination of a point on each of the two hyperplanes. In this way, the new solutions preserve the internal structure of the input variables.
This methodology can be of interest in several domains where the response under study is defined in terms of an interval of admissible values, such as specifications for a product in an industrial process, or tolerance intervals for computing compliant class-models.
The inversion of the corresponding fitted model defines a region in the input space (predictor variables) whose predictions fall within the specified interval. Then, estimating and exploring this region will increase the information about the problem under study.
{"title":"Latent variable model inversion for intervals. Application to tolerance intervals in class-modelling situations, and specification limits in process control","authors":"M.S. Sánchez , M.C. Ortiz , S. Ruiz , O. Valencia , L.A. Sarabia","doi":"10.1016/j.chemolab.2024.105166","DOIUrl":"https://doi.org/10.1016/j.chemolab.2024.105166","url":null,"abstract":"<div><p>The paper deals with the inversion of intervals when a PLS (Partial Least Squares) model is used. However, instead of discretizing the interval, it is proved that the region resulting from the inversion of a PLS model is a convex set bounded by two parallel hyperplanes, each corresponding to the direct inversion of each endpoint of the given interval.</p><p>When the domain of the input variables is a convex set, any feasible solution with predictions within the interval set in the response can be obtained as a convex combination of a point on each of the two hyperplanes. In this way, the new solutions preserve the internal structure of the input variables.</p><p>This methodology can be of interest in several domains where the response under study is defined in terms of an interval of admissible values, such as specifications for a product in an industrial process, or tolerance intervals for computing compliant class-models.</p><p>The inversion of the corresponding fitted model defines a region in the input space (predictor variables) whose predictions fall within the specified interval. Then, estimating and exploring this region will increase the information about the problem under study.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169743924001060/pdfft?md5=916b6271ac0ec8660781143e8ff364ff&pid=1-s2.0-S0169743924001060-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141435151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-17DOI: 10.1016/j.chemolab.2024.105167
Agnar Höskuldsson
Here is presented a procedure that extends standard PLS Regression to several data matrices in a path. The basic idea is to convert the path of data matrices into interconnected regressions. Forecasts by PLS are extended to multi-step forecasts for each data matrix in the path. We study how far we can make forecasts, i.e., how far we can ‘see’ in the path. It is shown how data paths are divided into parts, where multi-step forecasting can be carried out within each part. The principles of PLS are used to suggest criteria for estimation in the regressions. These methods can be used to supervise a complex path of industrial chemical/biological processes. It is shown how expanding and contracting paths, which is common for industrial processes, can be handled. These methods can be used to carry out analysis of general path models. It is shown briefly by an example how a Structural Equations Model, SEM, can be converted into a collection of sequential paths that can be analyzed by present methods. The results suggest that conclusions made at SEM analysis may not always be reliable. The theory is applied to process data. It is shown how we work with the analysis of each regression in a similar way as in PLS.
{"title":"PLS multi-step regressions in data paths","authors":"Agnar Höskuldsson","doi":"10.1016/j.chemolab.2024.105167","DOIUrl":"https://doi.org/10.1016/j.chemolab.2024.105167","url":null,"abstract":"<div><p>Here is presented a procedure that extends standard PLS Regression to several data matrices in a path. The basic idea is to convert the path of data matrices into interconnected regressions. Forecasts by PLS are extended to multi-step forecasts for each data matrix in the path. We study how far we can make forecasts, i.e., how far we can ‘see’ in the path. It is shown how data paths are divided into parts, where multi-step forecasting can be carried out within each part. The principles of PLS are used to suggest criteria for estimation in the regressions. These methods can be used to supervise a complex path of industrial chemical/biological processes. It is shown how expanding and contracting paths, which is common for industrial processes, can be handled. These methods can be used to carry out analysis of general path models. It is shown briefly by an example how a Structural Equations Model, SEM, can be converted into a collection of sequential paths that can be analyzed by present methods. The results suggest that conclusions made at SEM analysis may not always be reliable. The theory is applied to process data. It is shown how we work with the analysis of each regression in a similar way as in PLS.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141435166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-08DOI: 10.1016/j.chemolab.2024.105155
Paul-Albert Schneide , Neal B. Gallagher , Rasmus Bro
{"title":"Shift invariant soft trilinearity: Modelling shifts and shape changes in gas-chromatography coupled mass spectrometry","authors":"Paul-Albert Schneide , Neal B. Gallagher , Rasmus Bro","doi":"10.1016/j.chemolab.2024.105155","DOIUrl":"https://doi.org/10.1016/j.chemolab.2024.105155","url":null,"abstract":"","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141314561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-06DOI: 10.1016/j.chemolab.2024.105156
Diana C. Fechner , RamónA. Martinez , Melisa J. Hidalgo , Adriano Araújo Gomes , Roberto G. Pellerano , Héctor C. Goicoechea
In this study, 110 tea samples from South American countries (Argentina, Brazil, and Paraguay) and Asian countries (India and China) were analyzed using near-infrared spectroscopy (NIRS) together with a two-step chemometric authentication strategy (class modeling techniques and discriminant analysis) to authenticate commercial teas from Argentina. In the first step, one-class models were built and validated to authenticate South American teas using preprocessed NIRS data. For this purpose, data-driven soft independent modeling of class analogy (DD-SIMCA) and one-class partial least squares (OC-PLS) were used. The DD-SIMCA model gave the best results, with a sensitivity of 93.10%, specificity of 100%, and efficiency of 95.00%. In the second step, a support vector machine (SVM) was used to build and validate a multiclass model to discriminate between tea samples from Argentina and neighboring countries of South America. The best model was the combination of nine variables selected by the fast correlation-based filter (FCBF) method, with an accuracy of 98.30%. Therefore, we conclude that the combination of NIRS and two-step chemometric tools can be used to authenticate the geographical origin of samples with high inter-class similarity.
{"title":"Geographic authentication of argentinian teas by combining one-class models and discriminant methods for modeling near infrared spectra","authors":"Diana C. Fechner , RamónA. Martinez , Melisa J. Hidalgo , Adriano Araújo Gomes , Roberto G. Pellerano , Héctor C. Goicoechea","doi":"10.1016/j.chemolab.2024.105156","DOIUrl":"https://doi.org/10.1016/j.chemolab.2024.105156","url":null,"abstract":"<div><p>In this study, 110 tea samples from South American countries (Argentina, Brazil, and Paraguay) and Asian countries (India and China) were analyzed using near-infrared spectroscopy (NIRS) together with a two-step chemometric authentication strategy (class modeling techniques and discriminant analysis) to authenticate commercial teas from Argentina. In the first step, one-class models were built and validated to authenticate South American teas using preprocessed NIRS data. For this purpose, data-driven soft independent modeling of class analogy (DD-SIMCA) and one-class partial least squares (OC-PLS) were used. The DD-SIMCA model gave the best results, with a sensitivity of 93.10%, specificity of 100%, and efficiency of 95.00%. In the second step, a support vector machine (SVM) was used to build and validate a multiclass model to discriminate between tea samples from Argentina and neighboring countries of South America. The best model was the combination of nine variables selected by the fast correlation-based filter (FCBF) method, with an accuracy of 98.30%. Therefore, we conclude that the combination of NIRS and two-step chemometric tools can be used to authenticate the geographical origin of samples with high inter-class similarity.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141314560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}