Journal of Chemometrics最新文献_第10页

Application of ATR-FTIR Spectrum Combined With Ensemble Learning and Deep Learning for Identification of Amomum tsao-ko at Different Drying Temperatures ATR-FTIR光谱结合集成学习和深度学习在不同干燥温度下草砂鉴别中的应用

IF 2.1 4区化学 Q1 SOCIAL WORK

Journal of Chemometrics

Pub Date : 2025-03-05 DOI: 10.1002/cem.70018

Gang He, Shao-bing Yang, Yuan-zhong Wang

Amomum tsao-ko Crevost et Lemaire (A. tsao-ko) is an important medicinal plant and flavoring spice. A. tsao-ko dried at different drying temperatures has different nutritional and medicinal values, leading to the phenomenon of substandard products in the market from time to time. In this study, attenuated total reflection–Fourier transform infrared spectroscopy (ATR-FTIR) data were pre-processed with SD, normalization, EWMA, SNV to compare their effects on the recognition ability of SVM, RF, XGBoost, and CatBoost models. Meanwhile, full-band and local-band 2DCOS profiles were obtained to characterize the differences in chemical features of A. tsao-ko dried by different drying temperatures and classified in conjunction with the ResNet model. The results show that although traditional machine learning can obtain better classification results, the classification efficiency is very unsatisfactory, and the correct classification rate is improved to 97% after derivative (SD) preprocessing. The 2DCOS atlas is able to visualize the feature information in the samples, which is further combined with the ResNet model to obtain 100% classification correctness with excellent generalization ability and convergence effect. The above study was able to provide new ideas for quality evaluation of A. tsao-ko.

草果砂是一种重要的药用植物和调味香料。在不同的干燥温度下干燥的草子具有不同的营养和药用价值，导致市场上不时出现不合格产品的现象。本研究对衰减全反射-傅里叶变换红外光谱（ATR-FTIR）数据进行SD、归一化、EWMA、SNV预处理，比较其对SVM、RF、XGBoost和CatBoost模型识别能力的影响。同时，利用全波段和局部波段2DCOS谱图表征了不同干燥温度下草树化学特征的差异，并结合ResNet模型进行了分类。结果表明，传统的机器学习虽然可以获得更好的分类结果，但分类效率非常不理想，经过导数（SD）预处理后，正确分类率提高到97%。2DCOS图谱能够将样本中的特征信息可视化，并与ResNet模型进一步结合，获得100%的分类正确率，具有出色的泛化能力和收敛效果。本研究可为曹子的品质评价提供新的思路。

{"title":"Application of ATR-FTIR Spectrum Combined With Ensemble Learning and Deep Learning for Identification of Amomum tsao-ko at Different Drying Temperatures","authors":"Gang He, Shao-bing Yang, Yuan-zhong Wang","doi":"10.1002/cem.70018","DOIUrl":"10.1002/cem.70018","url":null,"abstract":"<div>\u0000 \u0000 <p><i>Amomum tsao-ko</i> Crevost et Lemaire (<i>A. tsao-ko</i>) is an important medicinal plant and flavoring spice. <i>A. tsao-ko</i> dried at different drying temperatures has different nutritional and medicinal values, leading to the phenomenon of substandard products in the market from time to time. In this study, attenuated total reflection–Fourier transform infrared spectroscopy (ATR-FTIR) data were pre-processed with SD, normalization, EWMA, SNV to compare their effects on the recognition ability of SVM, RF, XGBoost, and CatBoost models. Meanwhile, full-band and local-band 2DCOS profiles were obtained to characterize the differences in chemical features of <i>A. tsao-ko</i> dried by different drying temperatures and classified in conjunction with the ResNet model. The results show that although traditional machine learning can obtain better classification results, the classification efficiency is very unsatisfactory, and the correct classification rate is improved to 97% after derivative (SD) preprocessing. The 2DCOS atlas is able to visualize the feature information in the samples, which is further combined with the ResNet model to obtain 100% classification correctness with excellent generalization ability and convergence effect. The above study was able to provide new ideas for quality evaluation of <i>A. tsao-ko</i>.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 3","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143554799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multidimensional Patterns of Gas Sensors for Assessing the Microbiological Indicators of Raw Milk 原料奶微生物指标评价气体传感器的多维模式

IF 2.1 4区化学 Q1 SOCIAL WORK

Journal of Chemometrics

Pub Date : 2025-03-04 DOI: 10.1002/cem.70007

Anastasiia Shuba, Tatiana Kuchmenko, Ruslan Umarkhanov, Ekaterina Bogdanova, Ekaterina Anokhina, Inna Burakova

The paper discusses methods of using chemometrics methods for processing the output data of sensors with polycomposite coatings for analyzing the gas phase of raw milk and obtaining analytical information about its total microbiological contamination, the content of yeast and mold, and the presence of pathogenic microorganisms. To predict microbiological indicators of milk quality, the partial least squares regression and quadratic discriminant analysis were used. The initial data matrix included both an optimized set of sensor output data and calculated parameters at various data fusion levels. It is shown that multidimensional patterns of sensor output data differ depending on the task. A model for predicting the microbiological contamination of milk (QMAFAnM) with an error of 0.342 log CFU was obtained. It was shown that the sensitivity of classification of milk samples by the presence or absence of pathogenic microorganisms using discriminant analysis is 67%, and the specificity is 100% when using the calculated parameters of the sensor array. The proposed approaches can be applicable for processing data from various types of sensors when analyzing real objects with complex compositions.

本文讨论了用化学计量学方法处理复合涂层传感器输出数据，分析原料奶气相，获得原料奶微生物污染总量、酵母和霉菌含量、病原微生物存在等分析信息的方法。采用偏最小二乘回归和二次判别分析对牛奶品质微生物指标进行预测。初始数据矩阵包括一组优化的传感器输出数据和在不同数据融合水平下计算的参数。结果表明，传感器输出数据的多维模式随任务的不同而不同。建立了牛奶微生物污染预测模型（QMAFAnM），误差为0.342 log CFU。结果表明，利用该传感器阵列计算参数对牛奶样品进行病原微生物存在与否分类的灵敏度为67%，特异性为100%。所提出的方法可适用于分析具有复杂成分的真实物体时处理来自各种类型传感器的数据。

{"title":"Multidimensional Patterns of Gas Sensors for Assessing the Microbiological Indicators of Raw Milk","authors":"Anastasiia Shuba, Tatiana Kuchmenko, Ruslan Umarkhanov, Ekaterina Bogdanova, Ekaterina Anokhina, Inna Burakova","doi":"10.1002/cem.70007","DOIUrl":"10.1002/cem.70007","url":null,"abstract":"<div>\u0000 \u0000 <p>The paper discusses methods of using chemometrics methods for processing the output data of sensors with polycomposite coatings for analyzing the gas phase of raw milk and obtaining analytical information about its total microbiological contamination, the content of yeast and mold, and the presence of pathogenic microorganisms. To predict microbiological indicators of milk quality, the partial least squares regression and quadratic discriminant analysis were used. The initial data matrix included both an optimized set of sensor output data and calculated parameters at various data fusion levels. It is shown that multidimensional patterns of sensor output data differ depending on the task. A model for predicting the microbiological contamination of milk (QMAFAnM) with an error of 0.342 log CFU was obtained. It was shown that the sensitivity of classification of milk samples by the presence or absence of pathogenic microorganisms using discriminant analysis is 67%, and the specificity is 100% when using the calculated parameters of the sensor array. The proposed approaches can be applicable for processing data from various types of sensors when analyzing real objects with complex compositions.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 3","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143554292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Origin of the OECD Principles for QSAR Validation and Their Role in Changing the QSAR Paradigm Worldwide: An Historical Overview 经合组织QSAR验证原则的起源及其在改变全球QSAR范式中的作用：历史概述

IF 2.1 4区化学 Q1 SOCIAL WORK

Journal of Chemometrics

Pub Date : 2025-03-04 DOI: 10.1002/cem.70014

Paola Gramatica

The discussions in the QSAR community and the steps that led to the definition of the OECD Principles for the validation of QSAR models are illustrated here, framing the process in the general framework of QSAR modeling. The individual OECD Principles are presented, commenting on them in the light of significant publications that have appeared over the years, with particular attention to the aspects of statistical validation according to the chemometric approach. It will be highlighted how and to what extent the OECD Principles have influenced the subsequent work of all QSAR modelers and have led to a significant improvement in validated QSAR modeling applicable in the regulatory field and beyond.

这里说明了QSAR社区的讨论和导致定义OECD原则以验证QSAR模型的步骤，并在QSAR建模的一般框架中构建了该过程。提出了个别经合组织原则，并根据多年来出现的重要出版物对其进行评论，特别注意根据化学计量学方法进行统计验证的各个方面。它将强调经合组织原则如何以及在多大程度上影响了所有QSAR建模者的后续工作，并导致了适用于监管领域及其他领域的经过验证的QSAR建模的重大改进。

引用次数: 0

Novel Sexalinear Decomposition Algorithm for Analyzing the Chemical Sexalinear Data Array 化学性线性数据阵列分析的新型性线性分解算法

IF 2.1 4区化学 Q1 SOCIAL WORK

Journal of Chemometrics

Pub Date : 2025-02-24 DOI: 10.1002/cem.70013

Yue-Yue Chang, Qiu-Na Shi, Tong Wang, Hai-Long Wu, Ru-Qin Yu

With the development of analytical instrument towards more and more high-way and complex, it is very important and meaningful work to obtain ultra-high-way chemical data and explore its analytical methods. In this paper, a novel and excellent six-way algorithm combination method (six-way ACM) was proposed. In addition, a real chemically meaningful ultra-high-way sexalinear data array was obtained and constructed for the first time. The proposed six-way data array has highly collinearity, which puts forward higher requirements for parsing this data array to a certain extent. To verify the feasibility of the proposed algorithm, it was used to analyze the above real sexalinear six-way data array and a series of simulated six-way data arrays with different noise levels. The results of real data and simulated data demonstrate that the proposed method can be well used in the analysis of six-way data arrays and shows fascinating performance, including insensitive to excessive number of components, fast convergence speed, and suitable for high collinearity and high noise data. Compared with three-way, four-way, and five-way calibration methods, the six-way ACM provides higher sensitivity, a lower limit of detection, a lower limit of quantification, and more stable and accurate results, showing an outstanding “higher-order advantages” and better ability to handle collinearity problems. This work provides not only data analysis method for high-order instruments that may emerge in the future but also real data support and methodological reference for theoretical research on high-order tensor algebra.

随着分析仪器的发展越来越高速和复杂，获取超高速化学数据并探索其分析方法是一项非常重要和有意义的工作。本文提出了一种新颖、优良的六向算法组合方法（six-way ACM）。此外，还首次获得并构建了一个真正具有化学意义的超高速公路性线性数据阵列。所提出的六向数据阵列具有高度共线性，这在一定程度上对该数据阵列的解析提出了更高的要求。为了验证所提算法的可行性，对上述真实的性线性六向数据阵列和一系列不同噪声水平的模拟六向数据阵列进行了分析。实际数据和仿真数据的结果表明，所提出的方法可以很好地用于六向数据阵列的分析，并且具有对过多分量不敏感、收敛速度快、适用于高共线性和高噪声数据等优异的性能。与三路、四路和五路校准方法相比，六路ACM具有更高的灵敏度、检测下限和定量下限，结果更加稳定准确，具有突出的“高阶优势”和更好的处理共线性问题的能力。本工作不仅为未来可能出现的高阶仪器提供了数据分析方法，而且为高阶张量代数的理论研究提供了真实的数据支持和方法参考。

{"title":"Novel Sexalinear Decomposition Algorithm for Analyzing the Chemical Sexalinear Data Array","authors":"Yue-Yue Chang, Qiu-Na Shi, Tong Wang, Hai-Long Wu, Ru-Qin Yu","doi":"10.1002/cem.70013","DOIUrl":"10.1002/cem.70013","url":null,"abstract":"<div>\u0000 \u0000 <p>With the development of analytical instrument towards more and more high-way and complex, it is very important and meaningful work to obtain ultra-high-way chemical data and explore its analytical methods. In this paper, a novel and excellent six-way algorithm combination method (six-way ACM) was proposed. In addition, a real chemically meaningful ultra-high-way sexalinear data array was obtained and constructed for the first time. The proposed six-way data array has highly collinearity, which puts forward higher requirements for parsing this data array to a certain extent. To verify the feasibility of the proposed algorithm, it was used to analyze the above real sexalinear six-way data array and a series of simulated six-way data arrays with different noise levels. The results of real data and simulated data demonstrate that the proposed method can be well used in the analysis of six-way data arrays and shows fascinating performance, including insensitive to excessive number of components, fast convergence speed, and suitable for high collinearity and high noise data. Compared with three-way, four-way, and five-way calibration methods, the six-way ACM provides higher sensitivity, a lower limit of detection, a lower limit of quantification, and more stable and accurate results, showing an outstanding “higher-order advantages” and better ability to handle collinearity problems. This work provides not only data analysis method for high-order instruments that may emerge in the future but also real data support and methodological reference for theoretical research on high-order tensor algebra.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 3","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143481608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient Wavelength Selection for Limited Near-Infrared Spectral Data via Genetic Algorithm and Hybrid Regression 基于遗传算法和混合回归的有限近红外光谱数据有效波长选择

IF 2.1 4区化学 Q1 SOCIAL WORK

Journal of Chemometrics

Pub Date : 2025-02-24 DOI: 10.1002/cem.70015

Esra Pamukçu

Spectral data often contains a large number of variables that are highly correlated. Although Partial Least Squares (PLS) regression is specifically designed to handle issues arising from limited sample sizes, its effectiveness may still diminish in extremely small datasets, making it challenging to construct a calibration model with high predictive performance. This study introduces a new framework, the Genetic Algorithm and Hybrid Regression Model (GAHRM), designed specifically for variable selection and regression in high-dimensional, low-sample-size spectral datasets. GAHRM integrates Hybrid Regression, which constructs regression models using a covariance structure that is first stabilized through Thomaz Stabilization and then regularized, with Genetic Algorithm (GA), an efficient optimization technique for selecting the best subset of variables among a vast model space. Unlike traditional approaches that rely on exhaustive search for model selection criteria, GAHRM leverages GA to navigate the exponentially large search space, enabling computationally feasible and robust model construction. The effectiveness of GAHRM was validated on the benchmark “Gasoline” dataset, where it demonstrated superior performance compared to PLS in terms of prediction accuracy and model selection efficiency. These results highlight GAHRM as a powerful alternative for wavelength selection and calibration modeling in challenging data scenarios.

光谱数据通常包含大量高度相关的变量。虽然偏最小二乘（PLS）回归是专门为处理有限样本量引起的问题而设计的，但它的有效性在极小的数据集上仍然可能下降，这使得构建具有高预测性能的校准模型具有挑战性。本研究引入了一个新的框架，遗传算法和混合回归模型（GAHRM），专门用于高维、低样本容量光谱数据集的变量选择和回归。GAHRM将混合回归（Hybrid Regression）与遗传算法（Genetic Algorithm， GA）结合在一起，混合回归是使用协方差结构构建回归模型，协方差结构首先通过thomas稳定化稳定然后正则化，遗传算法是一种有效的优化技术，用于在巨大的模型空间中选择变量的最佳子集。与传统方法依赖于对模型选择标准的详尽搜索不同，GAHRM利用遗传算法来导航指数级大的搜索空间，从而实现计算上可行和健壮的模型构建。GAHRM的有效性在基准“汽油”数据集上得到验证，与PLS相比，GAHRM在预测精度和模型选择效率方面表现出优越的性能。这些结果突出了GAHRM作为具有挑战性的数据场景中波长选择和校准建模的强大替代方案。

{"title":"Efficient Wavelength Selection for Limited Near-Infrared Spectral Data via Genetic Algorithm and Hybrid Regression","authors":"Esra Pamukçu","doi":"10.1002/cem.70015","DOIUrl":"10.1002/cem.70015","url":null,"abstract":"<p>Spectral data often contains a large number of variables that are highly correlated. Although Partial Least Squares (PLS) regression is specifically designed to handle issues arising from limited sample sizes, its effectiveness may still diminish in e<i>x</i>tremely small datasets, making it challenging to construct a calibration model with high predictive performance. This study introduces a new framework, the Genetic Algorithm and Hybrid Regression Model (GAHRM), designed specifically for variable selection and regression in high-dimensional, low-sample-size spectral datasets. GAHRM integrates Hybrid Regression, which constructs regression models using a covariance structure that is first stabilized through Thomaz Stabilization and then regularized, with Genetic Algorithm (GA), an efficient optimization technique for selecting the best subset of variables among a vast model space. Unlike traditional approaches that rely on exhaustive search for model selection criteria, GAHRM leverages GA to navigate the exponentially large search space, enabling computationally feasible and robust model construction. The effectiveness of GAHRM was validated on the benchmark “Gasoline” dataset, where it demonstrated superior performance compared to PLS in terms of prediction accuracy and model selection efficiency. These results highlight GAHRM as a powerful alternative for wavelength selection and calibration modeling in challenging data scenarios.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 3","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.70015","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143481606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Principal Components Analysis: Row Scaling and Compositional Data 主成分分析：行缩放和成分数据

IF 2.1 4区化学 Q1 SOCIAL WORK

Journal of Chemometrics

Pub Date : 2025-02-20 DOI: 10.1002/cem.3606

Richard G. Brereton

Row scaling is sometimes called normalisation, but this term is also sometimes used for column standardisation, so we will avoid the latter term in this article, to prevent confusion.

Of course, whether this improvement is observed does depend on the structure of the data, but if the difference between samples is primarily due to the relative concentrations or proportions and the amount of sample is not easy to control, row scaling to constant total often results in an improvement. It can be combined with other approaches for column transformation such as standardisation as discussed in the previous article.

If there are only two variables, the simplex is a line. In Figure 4, we illustrate the scores first 2 PCs of the dataset formed by the first two variables from Table 1. We see that after row scaling there is only one non-zero PC. In this case, the position along the line relates to the class membership of each object, although this is not always so and depends on an appropriate choice of variables.

In the case of the data in Table 1, row scaling improves visualisation of the class differences and structure in the data in this case. However, row scaling is not always appropriate. If the absolute values of each variable are known accurately (e.g., the amount of sample extracted can be kept constant or calibrated to a known standard), compositional data lose information. In addition, sometimes there may be one or two very intense variables that are of subsidiary interest; for example, a primary metabolite that is very intense but has little or no relationship to the factors of interest; the proportions will be dominated by this uninteresting factor.

However, row scaling is a common procedure in many areas of chemometrics. There is a significant statistical literature about multivariate compositional data. If the main aim of an analysis is qualitative, for example, to separate groups or find outliers, often some of the more elaborate statistical considerations are of secondary importance. If, however, the data are to be used for statistical inference, such as hypothesis tests or p values or estimation, it is a good idea to look closely at the classical literature in order to best interpret and process compositional data.

行缩放有时被称为规范化，但这个术语有时也用于列标准化，因此在本文中我们将避免使用后一个术语，以防止混淆。当然，是否观察到这种改进确实取决于数据的结构，但如果样本之间的差异主要是由于相对浓度或比例，并且样本量不易控制，则行缩放到恒定总数通常会导致改进。它可以与其他列转换方法结合使用，例如上一篇文章中讨论的标准化方法。如果只有两个变量，单纯形就是一条直线。在图4中，我们演示了由表1中的前两个变量组成的数据集的前2个pc的分数。我们看到行缩放后只有一个非零PC。在这种情况下，沿着线的位置与每个对象的类成员关系相关，尽管这并不总是如此，并且取决于适当的变量选择。对于表1中的数据，行缩放改善了这种情况下数据中类差异和结构的可视化。但是，行缩放并不总是合适的。如果每个变量的绝对值是准确已知的（例如，提取的样品量可以保持不变或校准到已知的标准），成分数据丢失信息。此外，有时可能有一两个非常强烈的变量是附属的利益；例如，一种初级代谢物非常强烈，但与感兴趣的因素很少或没有关系；比例将由这个无趣的因素决定。然而，行缩放在化学计量学的许多领域是一种常见的程序。有一个重要的统计文献多元成分数据。如果分析的主要目的是定性的，例如，分离组或发现异常值，那么一些更详细的统计考虑通常是次要的。但是，如果数据要用于统计推断，例如假设检验或p值或估计，那么仔细查看经典文献是一个好主意，以便最好地解释和处理组合数据。

{"title":"Principal Components Analysis: Row Scaling and Compositional Data","authors":"Richard G. Brereton","doi":"10.1002/cem.3606","DOIUrl":"10.1002/cem.3606","url":null,"abstract":"<p>Row scaling is sometimes called normalisation, but this term is also sometimes used for column standardisation, so we will avoid the latter term in this article, to prevent confusion.</p><p>Of course, whether this improvement is observed does depend on the structure of the data, but if the difference between samples is primarily due to the relative concentrations or proportions and the amount of sample is not easy to control, row scaling to constant total often results in an improvement. It can be combined with other approaches for column transformation such as standardisation as discussed in the previous article.</p><p>If there are only two variables, the simplex is a line. In Figure 4, we illustrate the scores first 2 PCs of the dataset formed by the first two variables from Table 1. We see that after row scaling there is only one non-zero PC. In this case, the position along the line relates to the class membership of each object, although this is not always so and depends on an appropriate choice of variables.</p><p>In the case of the data in Table 1, row scaling improves visualisation of the class differences and structure in the data in this case. However, row scaling is not always appropriate. If the absolute values of each variable are known accurately (e.g., the amount of sample extracted can be kept constant or calibrated to a known standard), compositional data lose information. In addition, sometimes there may be one or two very intense variables that are of subsidiary interest; for example, a primary metabolite that is very intense but has little or no relationship to the factors of interest; the proportions will be dominated by this uninteresting factor.</p><p>However, row scaling is a common procedure in many areas of chemometrics. There is a significant statistical literature about multivariate compositional data. If the main aim of an analysis is qualitative, for example, to separate groups or find outliers, often some of the more elaborate statistical considerations are of secondary importance. If, however, the data are to be used for statistical inference, such as hypothesis tests or <i>p</i> values or estimation, it is a good idea to look closely at the classical literature in order to best interpret and process compositional data.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 3","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3606","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143455747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detection of Lead Chrome Green in Tea Based on Near-Infrared Reflectance Spectroscopy 近红外反射光谱法检测茶叶中铅铬绿

IF 2.1 4区化学 Q1 SOCIAL WORK

Journal of Chemometrics

Pub Date : 2025-02-18 DOI: 10.1002/cem.70011

Xiaogang Jiang, Penghui Cheng, Kang Ge, Siwei Lv, Yande Liu

Tea color is a part of tea quality, and illegal addition of lead chrome green (LCG) to improve tea quality cannot be identified by human eyes. This paper is based on near-infrared (NIR) reflectance spectroscopy to detect LCG stained tea and to investigate the feasibility of qualitative and quantitative methods. Firstly, the LCG in tea was qualitatively analyzed by partial least squares discriminant analysis (PLS-DA), random forest (RF), and least squares support vector machine (LSSVM) classification models, and the results showed that the classification accuracy of LSSVM reached 100%. For quantitative analysis, Savitzky–Golay convolutional smoothing (S-G) preprocessing combined with three feature extraction algorithms, namely, joint competitive adaptive weighted sampling (CARS), uninformative variable elimination (UVE), and successive projection algorithm (SPA), were used to build partial least squares (PLS), RF, and LSSVM regression models sequentially on the preprocessed data. The S-G-UVE-LSSVM showed the best regression prediction ability in detecting LCG in tea, with a tested R² of 0.96. These results show the feasibility of NIR spectroscopy for the detection of added LCG in tea.

茶叶颜色是茶叶品质的一部分，非法添加铅铬绿（LCG）来提高茶叶品质是人眼无法识别的。本文采用近红外（NIR）反射光谱法检测LCG染色茶叶，探讨定性和定量方法的可行性。首先，采用偏最小二乘判别分析（PLS-DA）、随机森林（RF）和最小二乘支持向量机（LSSVM）分类模型对茶叶中的LCG进行定性分析，结果表明LSSVM的分类准确率达到100%。定量分析方面，采用Savitzky-Golay卷积平滑（S-G）预处理，结合联合竞争自适应加权抽样（CARS）、无信息变量消除（UVE）和逐次投影算法（SPA）三种特征提取算法，在预处理后的数据上依次建立偏最小二乘（PLS）、RF和LSSVM回归模型。S-G-UVE-LSSVM检测茶叶中LCG的回归预测能力最好，经检验R2为0.96。结果表明，用近红外光谱法检测茶叶中添加的LCG是可行的。

{"title":"Detection of Lead Chrome Green in Tea Based on Near-Infrared Reflectance Spectroscopy","authors":"Xiaogang Jiang, Penghui Cheng, Kang Ge, Siwei Lv, Yande Liu","doi":"10.1002/cem.70011","DOIUrl":"10.1002/cem.70011","url":null,"abstract":"<div>\u0000 \u0000 <p>Tea color is a part of tea quality, and illegal addition of lead chrome green (LCG) to improve tea quality cannot be identified by human eyes. This paper is based on near-infrared (NIR) reflectance spectroscopy to detect LCG stained tea and to investigate the feasibility of qualitative and quantitative methods. Firstly, the LCG in tea was qualitatively analyzed by partial least squares discriminant analysis (PLS-DA), random forest (RF), and least squares support vector machine (LSSVM) classification models, and the results showed that the classification accuracy of LSSVM reached 100%. For quantitative analysis, Savitzky–Golay convolutional smoothing (S-G) preprocessing combined with three feature extraction algorithms, namely, joint competitive adaptive weighted sampling (CARS), uninformative variable elimination (UVE), and successive projection algorithm (SPA), were used to build partial least squares (PLS), RF, and LSSVM regression models sequentially on the preprocessed data. The S-G-UVE-LSSVM showed the best regression prediction ability in detecting LCG in tea, with a tested <i>R</i><sup>2</sup> of 0.96. These results show the feasibility of NIR spectroscopy for the detection of added LCG in tea.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 3","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143439091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Determination of Halitosis by Exhaled Breath Analysis Using Semiconductor Metal Oxide Sensors and Chemometric Methods 用半导体金属氧化物传感器和化学计量法呼气分析测定口臭

IF 2.1 4区化学 Q1 SOCIAL WORK

Journal of Chemometrics

Pub Date : 2025-02-17 DOI: 10.1002/cem.70012

Mikhail Saveliev, Andrey Volchek, Galina Lavrenova, Ol'ga Malay, Mikhail Grevtsev, Igor Jahatspanian

Halitosis is a condition associated with bad breath. Although halitosis is a disease in its own right, it is often a symptom of more serious diseases (diabetes mellitus, renal failure, azotemia, etc.). The currently used method for diagnosing halitosis is the organoleptic method, which relies on a trained specialist evaluating the patient's breath odor. This approach to diagnosing halitosis is subjective, uncomfortable for both patient and doctor, and necessitates the involvement of a specially trained professional. As an alternative, instrumental diagnostics employing metal oxide semiconductor (MOS) sensor arrays offer a promising avenue by enabling patient classification through predeveloped models. This paper considers the application of seven MOS sensors of different compositions at three different temperatures. Different methods of chemometric data analysis were applied: k-nearest neighbors (kNN), decision trees (DT), support vector machine (SVM), logistic regression (LR), and projection on latent structures discrimination analysis (PLSDA). All applied methods demonstrated their effectiveness and achieved selectivity, sensitivity, and accuracy values exceeding 85%. Additionally, a combined classifier leveraging responses from all previously studied classifiers was explored, achieving near-perfect classification accuracy.

口臭是一种与口臭有关的疾病。虽然口臭本身就是一种疾病，但它往往是更严重疾病（糖尿病、肾衰竭、氮血症等）的症状。目前用于诊断口臭的方法是感官方法，它依赖于训练有素的专家评估病人的呼吸气味。这种诊断口臭的方法是主观的，对病人和医生都不舒服，需要一个受过专门训练的专业人员的参与。作为替代方案，采用金属氧化物半导体（MOS）传感器阵列的仪器诊断提供了一个有前途的途径，通过预先开发的模型实现患者分类。本文研究了7种不同成分的MOS传感器在3种不同温度下的应用。不同的化学计量学数据分析方法应用：k近邻（kNN），决策树（DT），支持向量机（SVM），逻辑回归（LR）和潜在结构判别分析（PLSDA）投影。所有应用的方法均证明了其有效性，并取得了选择性、灵敏度和准确度超过85%的值。此外，还探索了利用所有先前研究过的分类器的响应的组合分类器，实现了近乎完美的分类精度。

{"title":"Determination of Halitosis by Exhaled Breath Analysis Using Semiconductor Metal Oxide Sensors and Chemometric Methods","authors":"Mikhail Saveliev, Andrey Volchek, Galina Lavrenova, Ol'ga Malay, Mikhail Grevtsev, Igor Jahatspanian","doi":"10.1002/cem.70012","DOIUrl":"10.1002/cem.70012","url":null,"abstract":"<div>\u0000 \u0000 <p>Halitosis is a condition associated with bad breath. Although halitosis is a disease in its own right, it is often a symptom of more serious diseases (diabetes mellitus, renal failure, azotemia, etc.). The currently used method for diagnosing halitosis is the organoleptic method, which relies on a trained specialist evaluating the patient's breath odor. This approach to diagnosing halitosis is subjective, uncomfortable for both patient and doctor, and necessitates the involvement of a specially trained professional. As an alternative, instrumental diagnostics employing metal oxide semiconductor (MOS) sensor arrays offer a promising avenue by enabling patient classification through predeveloped models. This paper considers the application of seven MOS sensors of different compositions at three different temperatures. Different methods of chemometric data analysis were applied: <i>k</i>-nearest neighbors (kNN), decision trees (DT), support vector machine (SVM), logistic regression (LR), and projection on latent structures discrimination analysis (PLSDA). All applied methods demonstrated their effectiveness and achieved selectivity, sensitivity, and accuracy values exceeding 85%. Additionally, a combined classifier leveraging responses from all previously studied classifiers was explored, achieving near-perfect classification accuracy.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 2","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143431714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Multiple Linear Regression–Based Algorithm to Correct for Cosmic Rays in Raman Images 基于多元线性回归的拉曼图像宇宙射线校正算法

IF 2.1 4区化学 Q1 SOCIAL WORK

Journal of Chemometrics

Pub Date : 2025-02-17 DOI: 10.1002/cem.70000

Hery Mitsutake, Eneida de Paula, Heloisa N. Bordallo, Douglas N. Rutledge

Raman imaging is a powerful technique for simultaneously obtaining chemical and spatial information on diverse materials. One of the most common detectors used on Raman equipment is the charge coupled detector (CCD) due its high sensitivity. However, CCDs are also sensitive to cosmic rays, that generate very narrow and intense signals: cosmic ray spikes. Since these peaks can be very intense and numerous, it is important to eliminate them before any data analysis. Some methods to do this use comparison of neighboring pixels to identify spikes, but when using the line-scanning acquisition mode, it is common that these spikes appear in two or more pixels close together. Thus, in this work, a new algorithm has been developed to correct for cosmic ray spikes in Raman images, based on multiple linear regression (MLR). This algorithm takes less than 1 min in images with more than 70,000 spectra and removes all spikes, even those at low intensity.

拉曼成像是一种强大的技术，可以同时获得不同材料的化学和空间信息。电荷耦合器件（CCD）由于其高灵敏度，是拉曼仪器中最常用的探测器之一。然而，ccd对宇宙射线也很敏感，会产生非常窄而强烈的信号：宇宙射线尖峰。由于这些峰值可能非常强烈且数量众多，因此在进行任何数据分析之前消除它们非常重要。有些方法使用相邻像素的比较来识别尖峰，但是当使用行扫描采集模式时，这些尖峰通常出现在两个或更多靠近在一起的像素中。因此，在这项工作中，基于多元线性回归（MLR），开发了一种新的算法来校正拉曼图像中的宇宙射线峰值。该算法在超过70,000个光谱的图像中花费不到1分钟的时间，并去除所有尖峰，即使是低强度的尖峰。

引用次数: 0

Multimodal Stacked Modeling for Simultaneous Detection of Nutrient Concentrations With Turbidity Correction 同时检测浊度校正的营养物浓度的多模态叠加模型

IF 2.1 4区化学 Q1 SOCIAL WORK

Journal of Chemometrics

Pub Date : 2025-02-17 DOI: 10.1002/cem.70009

Meryem Nini, Mohamed Nohair

In this paper, an innovative method for the simultaneous determination of nitrite, nitrate, and COD in water in the presence of turbidity as a source of noise in spectroscopic data has been investigated. UV–Vis absorption spectrometry and advanced machine learning are proposed to develop a stacking model, a sophisticated modeling approach that combines several basic models (PLS, Lasso, and Ridge regression) and a meta-regressor (Random Forest regressor) to improve prediction accuracy by incorporating baseline correction and principal component analysis (PCA) to mitigate the effects of turbidity on spectroscopic data. After applying these corrections, a significant improvement was observed: The root mean square error (RMSE) and the mean absolute error (MAE) were significantly reduced, and the correlation coefficient (R²) between predicted and actual values of nitrite, nitrate, COD, and turbidity was greater than 0.96, for all compounds in the test data set, that demonstrate the ability of the proposed stacking model to accurately predict nutrient concentrations simultaneously, even in complex environments; the proposed model may provide a valuable alternative to wet chemical methods. Due to its high accuracy and fast response, the proposed model can be used as an algorithm for the construction of nutrient sensors. This paper highlights the importance of integrating advanced modeling and data correction techniques to improve the robustness and accuracy of predictive models in environmental chemistry, thus providing valuable information for environmental monitoring and management.

本文研究了一种同时测定水中亚硝酸盐、硝酸盐和COD的创新方法，该方法在光谱数据中存在浑浊作为噪声源的情况下进行了研究。UV-Vis吸收光谱法和先进的机器学习提出了一个叠加模型，一个复杂的建模方法，结合了几个基本模型（PLS， Lasso和Ridge回归）和一个元回归（随机森林回归），通过结合基线校正和主成分分析（PCA）来提高预测精度，以减轻浊度对光谱数据的影响。应用这些修正后，观察到显著的改善：根均方误差（RMSE）和平均绝对误差（MAE）显著降低，亚硝酸盐、硝酸盐、COD和浊度的预测值与实际值之间的相关系数（R2）大于0.96，对于测试数据集中的所有化合物，表明所提出的叠加模型能够准确地同时预测营养物质浓度，即使在复杂的环境中；所提出的模型可能为湿化学方法提供一种有价值的替代方法。该模型具有精度高、响应速度快的特点，可作为构建营养传感器的一种算法。结合先进的建模和数据校正技术，提高环境化学预测模型的鲁棒性和准确性，从而为环境监测和管理提供有价值的信息。

{"title":"Multimodal Stacked Modeling for Simultaneous Detection of Nutrient Concentrations With Turbidity Correction","authors":"Meryem Nini, Mohamed Nohair","doi":"10.1002/cem.70009","DOIUrl":"10.1002/cem.70009","url":null,"abstract":"<div>\u0000 \u0000 <p>In this paper, an innovative method for the simultaneous determination of nitrite, nitrate, and COD in water in the presence of turbidity as a source of noise in spectroscopic data has been investigated. UV–Vis absorption spectrometry and advanced machine learning are proposed to develop a stacking model, a sophisticated modeling approach that combines several basic models (PLS, Lasso, and Ridge regression) and a meta-regressor (Random Forest regressor) to improve prediction accuracy by incorporating baseline correction and principal component analysis (PCA) to mitigate the effects of turbidity on spectroscopic data. After applying these corrections, a significant improvement was observed: The root mean square error (RMSE) and the mean absolute error (MAE) were significantly reduced, and the correlation coefficient (<i>R</i><sup>2</sup>) between predicted and actual values of nitrite, nitrate, COD, and turbidity was greater than 0.96, for all compounds in the test data set, that demonstrate the ability of the proposed stacking model to accurately predict nutrient concentrations simultaneously, even in complex environments; the proposed model may provide a valuable alternative to wet chemical methods. Due to its high accuracy and fast response, the proposed model can be used as an algorithm for the construction of nutrient sensors. This paper highlights the importance of integrating advanced modeling and data correction techniques to improve the robustness and accuracy of predictive models in environmental chemistry, thus providing valuable information for environmental monitoring and management.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 3","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143431375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0