首页 > 最新文献

Chemometrics and Intelligent Laboratory Systems最新文献

英文 中文
Combining PLS-DA and SIMCA on NIR data for classifying raw materials for tyre industry: A hierarchical classification model 在近红外数据上结合 PLS-DA 和 SIMCA 对轮胎工业原料进行分类:分层分类模型
IF 3.9 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-05-19 DOI: 10.1016/j.chemolab.2024.105150
Riccardo Voccio , Cristina Malegori , Paolo Oliveri , Federica Branduani , Marco Arimondi , Andrea Bernardi , Giorgio Luciano , Mattia Cettolin

Tyre materials are complex products, as they are prepared using a number of raw materials, each of them with its specific chemical composition and functionality in the final product. It is, therefore, of crucial importance to avoid mislabeling errors and even to verify the compliance of raw materials entering the factory.

The present study proposes a strategy that makes use of near infrared (NIR) spectroscopy combined with chemometrics for raw material identification (RMID) and compliance verification of the most common raw materials used in the tyre industry. In particular, the chemometric model developed consists of a global hierarchical classification model, which combines nested PLS-DA nodes for RMID and SIMCA nodes for compliance verification, in a two-step approach.

The global model showed satisfactory results, as a 100 % of total correct predictions and a sensitivity higher than 90 % in the test set were obtained for most of the classes of interest.

The strategy obtained has the final goal of being directly applied on the raw materials at their receiving stage in factory, with the double advantage of minimizing the risk of mislabeling and, at the same time, decreasing the number of suspicious samples that need to be analyzed in the laboratory, by means of traditional methods, for verifying their compliance.

轮胎材料是一种复杂的产品,因为它们是由多种原材料制备而成的,每种原材料都有其特定的化学成分和在最终产品中的功能。本研究提出了一种策略,利用近红外光谱与化学计量学相结合,对轮胎行业最常用的原材料进行原材料识别(RMID)和合规性验证。特别是,所开发的化学计量学模型由一个全局分层分类模型组成,该模型以两步法将用于 RMID 的嵌套 PLS-DA 节点和用于合规性验证的 SIMCA 节点结合在一起。全局模型显示出令人满意的结果,对大多数相关类别的预测正确率达到 100%,测试集的灵敏度高于 90%。所获策略的最终目标是直接应用于工厂接收阶段的原材料,具有双重优势,即最大限度地降低错误标记的风险,同时减少需要在实验室通过传统方法分析的可疑样品数量,以验证其是否符合要求。
{"title":"Combining PLS-DA and SIMCA on NIR data for classifying raw materials for tyre industry: A hierarchical classification model","authors":"Riccardo Voccio ,&nbsp;Cristina Malegori ,&nbsp;Paolo Oliveri ,&nbsp;Federica Branduani ,&nbsp;Marco Arimondi ,&nbsp;Andrea Bernardi ,&nbsp;Giorgio Luciano ,&nbsp;Mattia Cettolin","doi":"10.1016/j.chemolab.2024.105150","DOIUrl":"https://doi.org/10.1016/j.chemolab.2024.105150","url":null,"abstract":"<div><p>Tyre materials are complex products, as they are prepared using a number of raw materials, each of them with its specific chemical composition and functionality in the final product. It is, therefore, of crucial importance to avoid mislabeling errors and even to verify the compliance of raw materials entering the factory.</p><p>The present study proposes a strategy that makes use of near infrared (NIR) spectroscopy combined with chemometrics for raw material identification (RMID) and compliance verification of the most common raw materials used in the tyre industry. In particular, the chemometric model developed consists of a global hierarchical classification model, which combines nested PLS-DA nodes for RMID and SIMCA nodes for compliance verification, in a two-step approach.</p><p>The global model showed satisfactory results, as a 100 % of total correct predictions and a sensitivity higher than 90 % in the test set were obtained for most of the classes of interest.</p><p>The strategy obtained has the final goal of being directly applied on the raw materials at their receiving stage in factory, with the double advantage of minimizing the risk of mislabeling and, at the same time, decreasing the number of suspicious samples that need to be analyzed in the laboratory, by means of traditional methods, for verifying their compliance.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"250 ","pages":"Article 105150"},"PeriodicalIF":3.9,"publicationDate":"2024-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S016974392400090X/pdfft?md5=c98998e0122d4f4f2c21e7b0a46c05e0&pid=1-s2.0-S016974392400090X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141090372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An alternative for the robust assessment of the repeatability and reproducibility of analytical measurements using bivariate dispersion 利用双变量离散度对分析测量的重复性和再现性进行稳健评估的替代方法。
IF 3.9 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-05-18 DOI: 10.1016/j.chemolab.2024.105148
Elfried Salanon , Blandine Comte , Delphine Centeno , Stéphanie Durand , Estelle Pujos-Guillot , Julien Boccard

Introduction

Assessing repeatability and reproducibility in analytical chemistry is commonly based on parametric dispersion indicators, such as relative standard deviation and standard deviation, calculated for each detected variable using repeated measurements of Quality Control (QC) samples collected throughout the data acquisition sequence. However, their reliability strongly relies on the assumption of normality distribution. Knowing that analytical variability is conditional to many sources, the use of such parametric estimators is not always suitable. There is therefore a need for robust indicators of data quality independent of central values and any parametric assumption.

Methods

Three specific indicators were developed: (i) intra-group dispersion, based on the median area of the convex hull of QC samples within an analytical batch; (ii) inter-group dispersion, defined as the gradient of the deviation between analytical batches; and (iii) dispersion index. Mathematical properties of these indicators, including positivity, stability, and translation invariance, were then evaluated using synthetic data under normal and non-normal distributions. Finally, the relevance of these indicators and the associated visualization methods were highlighted based on a metabolomics case study involving liquid chromatography coupled to mass spectrometry measurements of the NIST SRM1950 reference material analyzed over more than one year within different projects.

Results

The proposed indicators were shown to be translation invariant and always positive, while first investigations performed on synthetic data revealed a high stability for multiplication. Moreover, their application to experimental data revealed specific behaviors depending on the characteristics of the signal associated with the different detected analytes, showing their ability to capture the variability observed either in parametric or non-parametric conditions. Moreover, this investigation showed different structures of sensitivity to analytical variability all along the data processing steps. The proposed indicators also allowed a visualization of the analytical drift in two dimensions, to facilitate result interpretation.

Conclusion

These indicators open the way to a better and more robust assessment of repeatability and reproducibility but also to improvements of long-term data comparability involving suitability testing.

引言 在分析化学中,评估重复性和再现性通常基于参数离散度指标,如相对标准偏差和标准偏差,这些指标是利用整个数据采集序列中收集的质量控制(QC)样本的重复测量结果,为每个检测变量计算出来的。然而,这些指标的可靠性在很大程度上依赖于正态分布假设。由于分析变异性受多种因素影响,使用这种参数估计器并不总是合适的。因此,我们需要独立于中心值和任何参数假设的稳健的数据质量指标。方法 我们开发了三个具体指标:(i) 组内离散度,基于分析批次内质控样本凸壳的中位面积;(ii) 组间离散度,定义为分析批次之间的偏差梯度;以及 (iii) 离散指数。然后,利用正态分布和非正态分布下的合成数据对这些指标的数学特性(包括正向性、稳定性和平移不变性)进行了评估。最后,基于一项代谢组学案例研究,强调了这些指标和相关可视化方法的相关性,该案例研究涉及在不同项目中对 NIST SRM1950 参考材料进行的一年多的液相色谱耦合质谱测量。此外,根据与不同检测分析物相关的信号特征,这些指标在实验数据中的应用揭示了特定的行为,显示了它们捕捉参数或非参数条件下观察到的变异性的能力。此外,这项调查还显示了数据处理步骤中对分析变异性的不同敏感性结构。这些指标不仅为更好、更稳健地评估重复性和再现性开辟了道路,也为改进涉及适用性测试的长期数据可比性开辟了道路。
{"title":"An alternative for the robust assessment of the repeatability and reproducibility of analytical measurements using bivariate dispersion","authors":"Elfried Salanon ,&nbsp;Blandine Comte ,&nbsp;Delphine Centeno ,&nbsp;Stéphanie Durand ,&nbsp;Estelle Pujos-Guillot ,&nbsp;Julien Boccard","doi":"10.1016/j.chemolab.2024.105148","DOIUrl":"10.1016/j.chemolab.2024.105148","url":null,"abstract":"<div><h3>Introduction</h3><p>Assessing repeatability and reproducibility in analytical chemistry is commonly based on parametric dispersion indicators, such as relative standard deviation and standard deviation, calculated for each detected variable using repeated measurements of Quality Control (QC) samples collected throughout the data acquisition sequence. However, their reliability strongly relies on the assumption of normality distribution. Knowing that analytical variability is conditional to many sources, the use of such parametric estimators is not always suitable. There is therefore a need for robust indicators of data quality independent of central values and any parametric assumption.</p></div><div><h3>Methods</h3><p>Three specific indicators were developed: (i) intra-group dispersion, based on the median area of the convex hull of QC samples within an analytical batch; (ii) inter-group dispersion, defined as the gradient of the deviation between analytical batches; and (iii) dispersion index. Mathematical properties of these indicators, including positivity, stability, and translation invariance, were then evaluated using synthetic data under normal and non-normal distributions. Finally, the relevance of these indicators and the associated visualization methods were highlighted based on a metabolomics case study involving liquid chromatography coupled to mass spectrometry measurements of the NIST SRM1950 reference material analyzed over more than one year within different projects.</p></div><div><h3>Results</h3><p>The proposed indicators were shown to be translation invariant and always positive, while first investigations performed on synthetic data revealed a high stability for multiplication. Moreover, their application to experimental data revealed specific behaviors depending on the characteristics of the signal associated with the different detected analytes, showing their ability to capture the variability observed either in parametric or non-parametric conditions. Moreover, this investigation showed different structures of sensitivity to analytical variability all along the data processing steps. The proposed indicators also allowed a visualization of the analytical drift in two dimensions, to facilitate result interpretation.</p></div><div><h3>Conclusion</h3><p>These indicators open the way to a better and more robust assessment of repeatability and reproducibility but also to improvements of long-term data comparability involving suitability testing.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"250 ","pages":"Article 105148"},"PeriodicalIF":3.9,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169743924000881/pdfft?md5=12d877a2bc93c6070b76e59f9583bbfc&pid=1-s2.0-S0169743924000881-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141135489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving golden jackel optimization algorithm: An application of chemical data classification 改进 Golden Jackel 优化算法:化学数据分类应用
IF 3.9 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-05-17 DOI: 10.1016/j.chemolab.2024.105149
Aiedh Mrisi Alharthi , Dler Hussein Kadir , Abdo Mohammed Al-Fakih , Zakariya Yahya Algamal , Niam Abdulmunim Al-Thanoon , Maimoonah Khalid Qasim

One of the main issues affecting the effectiveness of the quantitative structure-activity relationship (QSAR) classification techniques in chemometrics is high dimensionality. Applying feature selection is a critical procedure that determines the most relevant and important aspects of a dataset. It improves the effectiveness and accuracy of prediction models by effectively lowering the number of features. This decrease increases classification accuracy, reduces computing strain, and improves overall performance. Recently, the golden jackal optimization (GJO) algorithm was introduced, which has been successfully used to solve various continuous optimization issues. Therefore, this study proposes an improvement in the GJO algorithm employing chaotic maps, abbreviated as CGJO, to enhance the exploration and exploitation capability of the GJO algorithm in picking the essential descriptors in QSAR classification models with high classification accuracy and less computation time. Experimental findings based on four different high-dimensional chemical datasets show that the proposed CGJO algorithm can maximize classification accuracy while simultaneously decreasing the number of chosen descriptors and lowering the time required for computing. Thus, the proposed algorithm can be useful for chemical data classification in other QSAR modeling.

影响化学计量学中定量结构-活性关系(QSAR)分类技术有效性的主要问题之一是高维度。特征选择是确定数据集最相关和最重要方面的关键程序。它通过有效降低特征数量来提高预测模型的有效性和准确性。减少特征数量可以提高分类准确性,减少计算压力,并提高整体性能。最近,金豺优化(GJO)算法被引入,并成功用于解决各种连续优化问题。因此,本研究提出了一种采用混沌图的 GJO 算法改进方案,简称 CGJO,以增强 GJO 算法的探索和利用能力,从而在 QSAR 分类模型中挑选出高分类精度和较少计算时间的基本描述符。基于四个不同高维化学数据集的实验结果表明,所提出的 CGJO 算法可以最大限度地提高分类准确率,同时减少所选描述符的数量并降低计算所需的时间。因此,提出的算法可用于其他 QSAR 建模中的化学数据分类。
{"title":"Improving golden jackel optimization algorithm: An application of chemical data classification","authors":"Aiedh Mrisi Alharthi ,&nbsp;Dler Hussein Kadir ,&nbsp;Abdo Mohammed Al-Fakih ,&nbsp;Zakariya Yahya Algamal ,&nbsp;Niam Abdulmunim Al-Thanoon ,&nbsp;Maimoonah Khalid Qasim","doi":"10.1016/j.chemolab.2024.105149","DOIUrl":"10.1016/j.chemolab.2024.105149","url":null,"abstract":"<div><p>One of the main issues affecting the effectiveness of the quantitative structure-activity relationship (QSAR) classification techniques in chemometrics is high dimensionality. Applying feature selection is a critical procedure that determines the most relevant and important aspects of a dataset. It improves the effectiveness and accuracy of prediction models by effectively lowering the number of features. This decrease increases classification accuracy, reduces computing strain, and improves overall performance. Recently, the golden jackal optimization (GJO) algorithm was introduced, which has been successfully used to solve various continuous optimization issues. Therefore, this study proposes an improvement in the GJO algorithm employing chaotic maps, abbreviated as CGJO, to enhance the exploration and exploitation capability of the GJO algorithm in picking the essential descriptors in QSAR classification models with high classification accuracy and less computation time. Experimental findings based on four different high-dimensional chemical datasets show that the proposed CGJO algorithm can maximize classification accuracy while simultaneously decreasing the number of chosen descriptors and lowering the time required for computing. Thus, the proposed algorithm can be useful for chemical data classification in other QSAR modeling.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"250 ","pages":"Article 105149"},"PeriodicalIF":3.9,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141034199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying 124 new anti-HIV drug candidates in a 37 billion-compound database: An integrated approach of machine learning (QSAR), molecular docking, and molecular dynamics simulation 在 370 亿化合物数据库中识别 124 种新的抗艾滋病毒候选药物:机器学习(QSAR)、分子对接和分子动力学模拟的综合方法
IF 3.9 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-05-15 DOI: 10.1016/j.chemolab.2024.105145
Alexandre de Fátima Cobre , Anderson Ara , Alexessander Couto Alves , Moisés Maia Neto , Mariana Millan Fachi , Laize Sílvia dos Anjos Botas Beca , Fernanda Stumpf Tonin , Roberto Pontarolo

Recent data from the World Health Organization reveals that in 2023, 38.8 million people were living with HIV. Within this population, there were 1.5 million new cases and 650 thousand deaths attributed to the disease. This study employs an integrated approach involving QSAR-based machine learning models, molecular docking, and molecular dynamics simulations to identify potential compounds for inhibiting the bioactivity of the CC chemokine receptor type 5 (CCR5) protein, a key entry point for the HIV virus. Using non-redundant experimental data from the CHEMBL database, 40 different machine learning algorithms were trained and the top four models (XGBoost, Histogram based gradient Boosting, Light Gradient Boosted Machine, and Extra Trees Regression) were utilized to predict anti-HIV bioactivity for 37 billion compounds in the ZINC-22 database. The screening resulted in the identification of 124 new anti-HIV drug candidates, confirmed through molecular docking and dynamics simulations. The study underscores the therapeutic potential of these compounds, paving the way for further in vitro and in vivo investigations. The convergence of machine learning and experimental findings presents a promising avenue for significant advancements in pharmaceutical research, particularly in the treatment of viral diseases such as HIV. To guarantee the reproducibility of our study, we have made the Python code (google colab) and the associated database available on GitHub. You can access them through the following link: GitHub Link: https://github.com/AlexandreCOBRE/code.

世界卫生组织的最新数据显示,2023 年有 3 880 万人感染艾滋病毒。在这一人群中,有 150 万新增病例和 65 万死亡病例。这项研究采用了一种综合方法,包括基于QSAR的机器学习模型、分子对接和分子动力学模拟,以确定抑制CC趋化因子受体5型(CCR5)蛋白生物活性的潜在化合物,CCR5蛋白是HIV病毒的一个关键入口。利用来自 CHEMBL 数据库的非冗余实验数据,对 40 种不同的机器学习算法进行了训练,并利用前四种模型(XGBoost、基于直方图的梯度提升、光梯度提升机和额外树回归)预测 ZINC-22 数据库中 370 亿种化合物的抗 HIV 生物活性。通过分子对接和动力学模拟,筛选出了 124 种新的抗 HIV 候选药物。这项研究强调了这些化合物的治疗潜力,为进一步的体外和体内研究铺平了道路。机器学习与实验结果的融合为药物研究的重大进展提供了一条大有可为的途径,尤其是在治疗艾滋病毒等病毒性疾病方面。为了保证研究的可重复性,我们在 GitHub 上提供了 Python 代码(google colab)和相关数据库。您可以通过以下链接访问它们:GitHub 链接:https://github.com/AlexandreCOBRE/code.
{"title":"Identifying 124 new anti-HIV drug candidates in a 37 billion-compound database: An integrated approach of machine learning (QSAR), molecular docking, and molecular dynamics simulation","authors":"Alexandre de Fátima Cobre ,&nbsp;Anderson Ara ,&nbsp;Alexessander Couto Alves ,&nbsp;Moisés Maia Neto ,&nbsp;Mariana Millan Fachi ,&nbsp;Laize Sílvia dos Anjos Botas Beca ,&nbsp;Fernanda Stumpf Tonin ,&nbsp;Roberto Pontarolo","doi":"10.1016/j.chemolab.2024.105145","DOIUrl":"10.1016/j.chemolab.2024.105145","url":null,"abstract":"<div><p>Recent data from the World Health Organization reveals that in 2023, 38.8 million people were living with HIV. Within this population, there were 1.5 million new cases and 650 thousand deaths attributed to the disease<strong>.</strong> This study employs an integrated approach involving QSAR-based machine learning models, molecular docking, and molecular dynamics simulations to identify potential compounds for inhibiting the bioactivity of the CC chemokine receptor type 5 (CCR5) protein, a key entry point for the HIV virus. Using non-redundant experimental data from the CHEMBL database, 40 different machine learning algorithms were trained and the top four models (XGBoost, Histogram based gradient Boosting, Light Gradient Boosted Machine, and Extra Trees Regression) were utilized to predict <em>anti</em>-HIV bioactivity for 37 billion compounds in the ZINC-22 database. The screening resulted in the identification of 124 new <em>anti</em>-HIV drug candidates, confirmed through molecular docking and dynamics simulations. The study underscores the therapeutic potential of these compounds, paving the way for further in vitro and in vivo investigations. The convergence of machine learning and experimental findings presents a promising avenue for significant advancements in pharmaceutical research, particularly in the treatment of viral diseases such as HIV. To guarantee the reproducibility of our study, we have made the Python code (google colab) and the associated database available on GitHub. You can access them through the following link: GitHub Link: <span>https://github.com/AlexandreCOBRE/code</span><svg><path></path></svg>.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"250 ","pages":"Article 105145"},"PeriodicalIF":3.9,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141031854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MA-XRF datasets analysis based on convolutional neural network: A case study on religious panel paintings 基于卷积神经网络的 MA-XRF 数据集分析:宗教壁画案例研究
IF 3.9 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-05-10 DOI: 10.1016/j.chemolab.2024.105138
Theofanis Gerodimos , Ioannis Georvasilis , Anastasios Asvestas , Georgios P. Mastrotheodoros , Aristidis Likas , Dimitrios F. Anagnostopoulos

Macroscopic X-ray fluorescence (MA-XRF) datasets are analyzed using Artificial Neural Networks. Specifically, Convolutional Neural Networks (CNNs) are trained by coupling the spectra acquired during the MA-XRF scan of two religious panel paintings (“icons”) with the associated Ground-Truth counts per characteristic transition line, as they are extracted by X-ray fluorescence fundamental parameters analysis. In total, twenty thousand XRF spectra were used for the CNN training. The trained neural networks were applied to analyze millions of MA-XRF spectra acquired during the scan of religious painting panels by computing the counts per pixel of X-ray characteristic transition lines and creating the elemental transition maps. Comparison of the CNN extracted results to the Ground-Truth (GT) shows remarkable agreement. The successful MA-XRF datasets analysis applying the CNN method paves an analytical path to the direction of the auto-identification of spectral lines, offering the means for the non-experienced XRF analyst to provide a state-of-the-art analysis and supporting the experienced user not to overlook hardly resolved transition lines.

利用人工神经网络对宏观 X 射线荧光 (MA-XRF) 数据集进行分析。具体来说,卷积神经网络(CNN)的训练方法是将对两幅宗教板画("圣像")进行 MA-XRF 扫描时获取的光谱与通过 X 射线荧光基本参数分析提取的每条特征过渡线的相关地面实况计数相耦合。CNN 训练总共使用了两万个 X 射线荧光光谱。通过计算每个像素的 X 射线特征转变线计数和创建元素转变图,将训练好的神经网络用于分析在扫描宗教绘画板时获取的数百万 MA-XRF 光谱。将 CNN 提取的结果与 "地面实况"(Ground-Truth,GT)进行比较,结果显示两者非常一致。应用 CNN 方法成功分析 MA-XRF 数据集为光谱线的自动识别方向铺平了分析道路,为没有经验的 XRF 分析师提供了最先进的分析手段,并帮助有经验的用户避免忽略难以解析的过渡线。
{"title":"MA-XRF datasets analysis based on convolutional neural network: A case study on religious panel paintings","authors":"Theofanis Gerodimos ,&nbsp;Ioannis Georvasilis ,&nbsp;Anastasios Asvestas ,&nbsp;Georgios P. Mastrotheodoros ,&nbsp;Aristidis Likas ,&nbsp;Dimitrios F. Anagnostopoulos","doi":"10.1016/j.chemolab.2024.105138","DOIUrl":"10.1016/j.chemolab.2024.105138","url":null,"abstract":"<div><p>Macroscopic X-ray fluorescence (MA-XRF) datasets are analyzed using Artificial Neural Networks. Specifically, Convolutional Neural Networks (CNNs) are trained by coupling the spectra acquired during the MA-XRF scan of two religious panel paintings (“icons”) with the associated Ground-Truth counts per characteristic transition line, as they are extracted by X-ray fluorescence fundamental parameters analysis. In total, twenty thousand XRF spectra were used for the CNN training. The trained neural networks were applied to analyze millions of MA-XRF spectra acquired during the scan of religious painting panels by computing the counts per pixel of X-ray characteristic transition lines and creating the elemental transition maps. Comparison of the CNN extracted results to the Ground-Truth (GT) shows remarkable agreement. The successful MA-XRF datasets analysis applying the CNN method paves an analytical path to the direction of the auto-identification of spectral lines, offering the means for the non-experienced XRF analyst to provide a state-of-the-art analysis and supporting the experienced user not to overlook hardly resolved transition lines.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"250 ","pages":"Article 105138"},"PeriodicalIF":3.9,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141045310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic non-destructive estimation of polyphenol oxidase and peroxidase enzyme activity levels in three bell pepper varieties by Vis/NIR spectroscopy imaging data based on machine learning methods 基于机器学习方法的可见光/近红外光谱成像数据自动无损估算三个甜椒品种的多酚氧化酶和过氧化物酶活性水平
IF 3.9 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-05-09 DOI: 10.1016/j.chemolab.2024.105137
Meysam Latifi Amoghin , Yousef Abbaspour-Gilandeh , Mohammad Tahmasebi , Juan Ignacio Arribas

The browning process of food products if often formed upon cutting and damage during their processing, transport, and storage, amongst other potential sources and reasons. Enzymic browning can be mainly due to polyphenol oxidase (PPO) and peroxidase (POD) enzymes. Visible/near-infrared (Vis/NIR) imaging spectroscopy in the range of 350–1150 nm was used in this study for automatic and non-destructive evaluation of PPO and POD activity levels in three bell pepper varieties (red, yellow, orange; N = 30), with a total of 30 inputs samples in each variety. The spectral data were then modeled by the partial least squares regression (PLSR) throughout the whole spectral range, without using any subset of the most effective wavelength (EW) values. Regression determination coefficient (R2) values for the estimation (prediction) of POD enzyme activity levels were 0.794, 0.772, and 0.726 for red, yellow, and orange bell peppers, respectively, all over the validation set. At the same time, the activity levels of PPO enzyme over bell peppers showed R2 values of 0.901, 0.810, and 0.859, for red, yellow, and orange bell peppers, respectively, all over the validation set. In addition, a combination of support vector machine (SVM) with either genetic algorithms (GA), particle swarm optimization (PSO), ant colony optimization (ACO), or imperialistic competitive algorithms (ICA) hybrid machine learning (ML) techniques were used to select the optimal (discriminant) spectral EW wavelength values, and regression performance was consistently improved, to judge from higher regression fit R2 values. Either 14 or 15 EWs were computed and selected in order of their discriminative power using previously mentioned ML techniques. The hybrid SVM-PSO method resulted the best one in the process of selecting the most effective wavelength values (nm). On the other hand, three regression methods comprising PLSR, multiple least regression (MLR), and neural network (NN), were employed to model the SVM-PSO selected EWs. The ratio of performance to deviation (RPD), the R2 and the root mean square error (RMSE), over the test set, for the non-linear NN regression method exhibited better results as compared to the other two regression methods, being closely followed by PLSR, and therefore NN regression method was selected as the best approach for modeling the most effective spectral wavelength values in this study.

食品在加工、运输和储藏过程中往往会因切割和损坏而形成褐变,此外还有其他潜在的来源和原因。酶促褐变主要是由多酚氧化酶(PPO)和过氧化物酶(POD)引起的。本研究采用 350-1150 纳米波长范围内的可见光/近红外(Vis/NIR)成像光谱,对三个甜椒品种(红、黄、橙;N = 30)的 PPO 和 POD 活性水平进行自动、非破坏性评估,每个品种共 30 个输入样本。然后在整个光谱范围内对光谱数据进行偏最小二乘回归(PLSR)建模,而不使用任何最有效波长(EW)值子集。在整个验证集中,红椒、黄椒和橙椒的 POD 酶活性水平的估计(预测)回归决定系数 (R2) 值分别为 0.794、0.772 和 0.726。同时,在所有验证集上,红椒、黄椒和橙椒的 PPO 酶活性水平的 R2 值分别为 0.901、0.810 和 0.859。此外,支持向量机(SVM)与遗传算法(GA)、粒子群优化(PSO)、蚁群优化(ACO)或帝国竞争算法(ICA)混合机器学习(ML)技术相结合,用于选择最佳(判别)光谱 EW 波长值,从更高的回归拟合 R2 值来看,回归性能得到了持续改善。利用前面提到的 ML 技术,计算出了 14 或 15 个 EW,并按照其判别能力的顺序进行了选择。在选择最有效波长值(纳米)的过程中,SVM-PSO 混合方法的效果最好。另一方面,包括 PLSR、多元最小回归 (MLR) 和神经网络 (NN) 在内的三种回归方法被用来为 SVM-PSO 选定的 EW 建模。与其他两种回归方法相比,非线性 NN 回归方法在测试集上的性能与偏差比(RPD)、R2 和均方根误差(RMSE)都表现出更好的结果,PLSR 紧随其后,因此 NN 回归方法被选为本研究中最有效光谱波长值建模的最佳方法。
{"title":"Automatic non-destructive estimation of polyphenol oxidase and peroxidase enzyme activity levels in three bell pepper varieties by Vis/NIR spectroscopy imaging data based on machine learning methods","authors":"Meysam Latifi Amoghin ,&nbsp;Yousef Abbaspour-Gilandeh ,&nbsp;Mohammad Tahmasebi ,&nbsp;Juan Ignacio Arribas","doi":"10.1016/j.chemolab.2024.105137","DOIUrl":"10.1016/j.chemolab.2024.105137","url":null,"abstract":"<div><p>The browning process of food products if often formed upon cutting and damage during their processing, transport, and storage, amongst other potential sources and reasons. Enzymic browning can be mainly due to polyphenol oxidase (PPO) and peroxidase (POD) enzymes. Visible/near-infrared (Vis/NIR) imaging spectroscopy in the range of 350–1150 nm was used in this study for automatic and non-destructive evaluation of PPO and POD activity levels in three bell pepper varieties (red, yellow, orange; N = 30), with a total of 30 inputs samples in each variety. The spectral data were then modeled by the partial least squares regression (PLSR) throughout the whole spectral range, without using any subset of the most effective wavelength (EW) values. Regression determination coefficient (R<sup>2</sup>) values for the estimation (prediction) of POD enzyme activity levels were 0.794, 0.772, and 0.726 for red, yellow, and orange bell peppers, respectively, all over the validation set. At the same time, the activity levels of PPO enzyme over bell peppers showed R<sup>2</sup> values of 0.901, 0.810, and 0.859, for red, yellow, and orange bell peppers, respectively, all over the validation set. In addition, a combination of support vector machine (SVM) with either genetic algorithms (GA), particle swarm optimization (PSO), ant colony optimization (ACO), or imperialistic competitive algorithms (ICA) hybrid machine learning (ML) techniques were used to select the optimal (discriminant) spectral EW wavelength values, and regression performance was consistently improved, to judge from higher regression fit R<sup>2</sup> values. Either 14 or 15 EWs were computed and selected in order of their discriminative power using previously mentioned ML techniques. The hybrid SVM-PSO method resulted the best one in the process of selecting the most effective wavelength values (nm). On the other hand, three regression methods comprising PLSR, multiple least regression (MLR), and neural network (NN), were employed to model the SVM-PSO selected EWs. The ratio of performance to deviation (RPD), the R<sup>2</sup> and the root mean square error (RMSE), over the test set, for the non-linear NN regression method exhibited better results as compared to the other two regression methods, being closely followed by PLSR, and therefore NN regression method was selected as the best approach for modeling the most effective spectral wavelength values in this study.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"250 ","pages":"Article 105137"},"PeriodicalIF":3.9,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169743924000777/pdfft?md5=1c66f4c9e2d7fdb5e8fd71595aa511f4&pid=1-s2.0-S0169743924000777-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141026864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A new sub-class linear discriminant for miniature spectrometer based food analysis 用于基于微型光谱仪的食品分析的新型子类线性判别器
IF 3.9 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-05-04 DOI: 10.1016/j.chemolab.2024.105136
Omar Nibouche , Fayas Asharindavida , Hui Wang , Jordan Vincent , Jun Liu , Saskia van Ruth , Paul Maguire , Enayet Rahman

The well-known and extensively studied Linear Discriminant Analysis (LDA) can have its performance lowered in scenarios where data is not homoscedastic or not Gaussian. That is, the classical assumptions when LDA models are built are not applicable, and consequently LDA projections would not be able to extract the needed features to explain the intrinsic structure of data and for classes to be separated. As with many real word data sets, data obtained using miniature spectrometers can suffer from such drawbacks which would limit the deployment of such technology needed for food analysis. The solution presented in the paper is to divide classes into subclasses and to use means of sub classes, classes, and data in the suggested between classes scatter metric. Further, samples belonging to the same subclass are used to build a measure of within subclass scatterness. Such a solution solves the shortcoming of the classical LDA. The obtained results when using the proposed solution on food data and on general machine learning datasets show that the work in this paper compares well to and is very competitive with similar sub-class LDA algorithms in the literature. An extension to a Hilbert space is also presented; and the kernel version of the presented solution can be fused with its linear counter parts to yield improved classification rates.

众所周知并被广泛研究的线性判别分析(LDA),在数据非同态或非高斯的情况下,其性能可能会降低。也就是说,建立 LDA 模型时的经典假设并不适用,因此 LDA 预测将无法提取所需的特征来解释数据的内在结构,也无法区分类别。与许多实词数据集一样,使用微型光谱仪获得的数据也可能存在此类缺陷,这将限制食品分析所需的此类技术的应用。本文提出的解决方案是将类分为子类,并在建议的类间散度指标中使用子类、类和数据的手段。此外,还使用属于同一子类的样本来建立子类内散度度量。这种解决方案解决了经典 LDA 的缺陷。在食品数据和一般机器学习数据集上使用提出的解决方案所获得的结果表明,本文的研究成果与文献中类似的子类 LDA 算法相比,具有很强的竞争力。本文还介绍了向希尔伯特空间的扩展;所提出解决方案的核版本可与其线性对应部分融合,以提高分类率。
{"title":"A new sub-class linear discriminant for miniature spectrometer based food analysis","authors":"Omar Nibouche ,&nbsp;Fayas Asharindavida ,&nbsp;Hui Wang ,&nbsp;Jordan Vincent ,&nbsp;Jun Liu ,&nbsp;Saskia van Ruth ,&nbsp;Paul Maguire ,&nbsp;Enayet Rahman","doi":"10.1016/j.chemolab.2024.105136","DOIUrl":"10.1016/j.chemolab.2024.105136","url":null,"abstract":"<div><p>The well-known and extensively studied Linear Discriminant Analysis (LDA) can have its performance lowered in scenarios where data is not homoscedastic or not Gaussian. That is, the classical assumptions when LDA models are built are not applicable, and consequently LDA projections would not be able to extract the needed features to explain the intrinsic structure of data and for classes to be separated. As with many real word data sets, data obtained using miniature spectrometers can suffer from such drawbacks which would limit the deployment of such technology needed for food analysis. The solution presented in the paper is to divide classes into subclasses and to use means of sub classes, classes, and data in the suggested between classes scatter metric. Further, samples belonging to the same subclass are used to build a measure of within subclass scatterness. Such a solution solves the shortcoming of the classical LDA. The obtained results when using the proposed solution on food data and on general machine learning datasets show that the work in this paper compares well to and is very competitive with similar sub-class LDA algorithms in the literature. An extension to a Hilbert space is also presented; and the kernel version of the presented solution can be fused with its linear counter parts to yield improved classification rates.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"250 ","pages":"Article 105136"},"PeriodicalIF":3.9,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169743924000765/pdfft?md5=79caa0e3ce066c5537d9c639d217ec83&pid=1-s2.0-S0169743924000765-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141055788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Experimental-based groundwater salinization from the carbonate aquifer of eastern Saudi Arabia: Insight into machine learning coupled with meta-heuristic algorithms 基于实验的沙特阿拉伯东部碳酸盐含水层地下水盐碱化:洞察机器学习与元启发式算法的结合
IF 3.9 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-05-01 DOI: 10.1016/j.chemolab.2024.105135
Mohammed Benaafi , Sani I. Abba , Mojeed Opeyemi Oyedeji , Auwalu Saleh Mubarak , Jamilu Usman , Isam H. Aljundi

Groundwater (GW) salinization of coastal aquifers has become a serious problem for attaining sustainable water resource management in Saudi Arabia and other parts of the world. Therefore, it is crucial to assess the extent of this salinization to protect and manage our water resources effectively. This research proposed real fieldwork GW samples at several locations supported with experimental based on chromatography (IC) and inductively coupled plasma mass spectrometry (ICP-MS) to analyze several GW physical, chemical, and hydro-geochemical elements. In this study, we model GW salinization with machine learning algorithms such as support vector regression, gaussian process regression, artificial neural networks, and least squares ensemble boosting regression tree. The performance of the standalone models was optimized with metaheuristic optimization-based algorithms such as fuzzy hybridized genetic algorithm (ANFIS-GA) and particle swarm optimization (ANFIS-PSO). The outcomes based on three variable input combinations were validated using several performance indicators and graphical methods. The quantitative analysis indicated that GPR-Combo1(MAE = 0.006 mg/L), Ensm- Combo2 (MAE = 0.025 mg/L), and GPR- Combo3 (MAE = 0.078 mg/L) proved merit among the standalone combinations. Where combo 1, 2, and 3 stand for model combinations derived from feature selection. The cumulative probability function (CPF) demonstrated that heuristic optimization ANFIS-GA (MAE = 0.0025 mg/L, MAPE = 0.19183) and ANFIS-PSO (MAE = 0.0018 mg/L, MAPE = 0.0723) outperformed the standalone error accuracy and served reliable approach. Both the standalone models and heuristic algorithms used for GW salinization modeling have demonstrated promising results in accurately predicting salinity. This approach could aid in effectively managing the GW resources for sustainable development.

沿海含水层的地下水(GW)盐碱化已成为沙特阿拉伯和世界其他地区实现可持续水资源管理的一个严重问题。因此,评估这种盐碱化的程度对于有效保护和管理我们的水资源至关重要。本研究建议在多个地点对地下水样本进行实地考察,并在色谱法(IC)和电感耦合等离子体质谱法(ICP-MS)的实验支持下,对地下水的物理、化学和水文地质化学元素进行分析。在本研究中,我们利用支持向量回归、高斯过程回归、人工神经网络和最小二乘集合提升回归树等机器学习算法对全球大气盐碱化进行建模。利用基于元启发式优化的算法,如模糊混合遗传算法(ANFIS-GA)和粒子群优化(ANFIS-PSO),对独立模型的性能进行了优化。使用多个性能指标和图形方法对基于三个变量输入组合的结果进行了验证。定量分析表明,GPR-Combo1(MAE = 0.006 mg/L)、Ensm- Combo2(MAE = 0.025 mg/L)和 GPR- Combo3(MAE = 0.078 mg/L)在独立组合中表现优异。其中组合 1、2 和 3 代表从特征选择中得出的模型组合。累积概率函数(CPF)表明,启发式优化 ANFIS-GA(MAE = 0.0025 mg/L,MAPE = 0.19183)和 ANFIS-PSO(MAE = 0.0018 mg/L,MAPE = 0.0723)的误差精度优于独立模型,是可靠的方法。用于全球水域盐渍化建模的独立模型和启发式算法在准确预测盐度方面都取得了可喜的成果。这种方法有助于有效管理全球水域资源,实现可持续发展。
{"title":"Experimental-based groundwater salinization from the carbonate aquifer of eastern Saudi Arabia: Insight into machine learning coupled with meta-heuristic algorithms","authors":"Mohammed Benaafi ,&nbsp;Sani I. Abba ,&nbsp;Mojeed Opeyemi Oyedeji ,&nbsp;Auwalu Saleh Mubarak ,&nbsp;Jamilu Usman ,&nbsp;Isam H. Aljundi","doi":"10.1016/j.chemolab.2024.105135","DOIUrl":"https://doi.org/10.1016/j.chemolab.2024.105135","url":null,"abstract":"<div><p>Groundwater (GW) salinization of coastal aquifers has become a serious problem for attaining sustainable water resource management in Saudi Arabia and other parts of the world. Therefore, it is crucial to assess the extent of this salinization to protect and manage our water resources effectively. This research proposed real fieldwork GW samples at several locations supported with experimental based on chromatography (IC) and inductively coupled plasma mass spectrometry (ICP-MS) to analyze several GW physical, chemical, and hydro-geochemical elements. In this study, we model GW salinization with machine learning algorithms such as support vector regression, gaussian process regression, artificial neural networks, and least squares ensemble boosting regression tree. The performance of the standalone models was optimized with metaheuristic optimization-based algorithms such as fuzzy hybridized genetic algorithm (ANFIS-GA) and particle swarm optimization (ANFIS-PSO). The outcomes based on three variable input combinations were validated using several performance indicators and graphical methods. The quantitative analysis indicated that GPR-Combo1(MAE = 0.006 mg/L), Ensm- Combo2 (MAE = 0.025 mg/L), and GPR- Combo3 (MAE = 0.078 mg/L) proved merit among the standalone combinations. Where combo 1, 2, and 3 stand for model combinations derived from feature selection. The cumulative probability function (CPF) demonstrated that heuristic optimization ANFIS-GA (MAE = 0.0025 mg/L, MAPE = 0.19183) and ANFIS-PSO (MAE = 0.0018 mg/L, MAPE = 0.0723) outperformed the standalone error accuracy and served reliable approach. Both the standalone models and heuristic algorithms used for GW salinization modeling have demonstrated promising results in accurately predicting salinity. This approach could aid in effectively managing the GW resources for sustainable development.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"249 ","pages":"Article 105135"},"PeriodicalIF":3.9,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140822260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the factor ambiguity of MCR problems for blockwise incomplete data sets 论块状不完整数据集 MCR 问题的因子模糊性
IF 3.9 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-04-27 DOI: 10.1016/j.chemolab.2024.105134
Martina Beese , Tomass Andersons , Mathias Sawall , Cyril Ruckebusch , Adrián Gómez-Sánchez , Robert Francke , Adrian Prudlik , Robert Franke , Klaus Neymeyr

Multivariate curve resolution (MCR) methods are sometimes faced with missing or erroneous data, e.g., due to sensor saturation. In some cases, an estimation of the missing data is possible, but often MCR works with the largest submatrix without missing entries. This ignores all rows and columns of the data matrix that contain missing values. A successful approach to deal with incomplete data multisets has been proposed by Alier and Tauler (2013), but it does not include a factor ambiguity analysis. Here, the missing data problem is addressed in combination with a factor ambiguity analysis. An approach is presented that minimizes the factor ambiguity by extracting a maximum of spectral information even from incomplete rows and columns of the spectral data matrix. The method requires a high signal-to-noise ratio. Applications are presented for UV/Vis and HSI data.

多变量曲线解析(MCR)方法有时会遇到数据缺失或错误的情况,例如由于传感器饱和。在某些情况下,可以对缺失数据进行估算,但 MCR 通常使用最大的无缺失条目的子矩阵。这就忽略了数据矩阵中包含缺失值的所有行和列。Alier 和 Tauler(2013 年)提出了一种处理不完整数据多集的成功方法,但其中不包括因子模糊性分析。在这里,缺失数据问题将结合因子模糊性分析来解决。本文提出了一种方法,即使从光谱数据矩阵不完整的行和列中提取最大的光谱信息,也能最大限度地减少因子模糊性。该方法需要较高的信噪比。介绍了 UV/Vis 和 HSI 数据的应用。
{"title":"On the factor ambiguity of MCR problems for blockwise incomplete data sets","authors":"Martina Beese ,&nbsp;Tomass Andersons ,&nbsp;Mathias Sawall ,&nbsp;Cyril Ruckebusch ,&nbsp;Adrián Gómez-Sánchez ,&nbsp;Robert Francke ,&nbsp;Adrian Prudlik ,&nbsp;Robert Franke ,&nbsp;Klaus Neymeyr","doi":"10.1016/j.chemolab.2024.105134","DOIUrl":"https://doi.org/10.1016/j.chemolab.2024.105134","url":null,"abstract":"<div><p>Multivariate curve resolution (MCR) methods are sometimes faced with missing or erroneous data, e.g., due to sensor saturation. In some cases, an estimation of the missing data is possible, but often MCR works with the largest submatrix without missing entries. This ignores all rows and columns of the data matrix that contain missing values. A successful approach to deal with incomplete data multisets has been proposed by Alier and Tauler (2013), but it does not include a factor ambiguity analysis. Here, the missing data problem is addressed in combination with a factor ambiguity analysis. An approach is presented that minimizes the factor ambiguity by extracting a maximum of spectral information even from incomplete rows and columns of the spectral data matrix. The method requires a high signal-to-noise ratio. Applications are presented for UV/Vis and HSI data.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"249 ","pages":"Article 105134"},"PeriodicalIF":3.9,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169743924000741/pdfft?md5=bb7d17fc695f88d0275f3839df0eb621&pid=1-s2.0-S0169743924000741-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140815811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Addressing adulteration challenges of dried oregano leaves by NIR HyperSpectral Imaging 利用近红外超光谱成像技术解决牛至干叶掺假问题
IF 3.9 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-04-23 DOI: 10.1016/j.chemolab.2024.105133
Veronica Ferrari , Rosalba Calvini , Camilla Menozzi , Alessandro Ulrici , Marco Bragolusi , Roberto Piro , Alessandra Tata , Michele Suman , Giorgia Foca

Dried oregano leaves are particularly prone to adulteration because of their widespread distribution and their easy mixing with leaves of other plants of lower commercial value, such as olive, myrtle, strawberry tree, or sumac. To reveal the presence of adulteration, in this study we considered an untargeted analytical approach, which instead of involving the a priori selection of specific compounds of interest is focused on defining the characteristic spectral signature of authentic oregano with respect to its most frequent adulterants. NIR HyperSpectral Imaging (NIR-HSI) represents a state-of-the-art, rapid and non-destructive technique, allowing for the collection of both spectral and spatial information from the sample, making it particularly suitable for characterizing visually heterogeneous samples.

Authentication issues are typically assessed through class modelling techniques and Soft Independent Modelling of class Analogy (SIMCA) is one of the most used algorithms in this scenario. However, the high variability and heterogeneity within the authentic oregano class resulted in poor outcomes when SIMCA was applied. As an alternative, Soft Partial Least Squares Discriminant Analysis (Soft PLS-DA) algorithm was applied to differentiate authentic oregano samples from pure adulterants. Soft PLS-DA represents a hybrid approach that combines the advantages of both discriminant and class modelling techniques. The resultant classification model has indeed led to promising results, achieving a prediction efficiency of 92.9 %. Finally, based on the percentage of pixels predicted as oregano in the Soft-PLSDA prediction images, a threshold value of 10 % was established, serving as a detection limit of NIR-HSI to distinguish authentic oregano samples from adulterated ones.

牛至干叶特别容易掺假,因为它们分布广泛,很容易与其他商业价值较低的植物(如橄榄、桃金娘、草莓树或苏木)的叶子混在一起。为了揭示掺假现象的存在,我们在本研究中采用了一种非靶向分析方法,这种方法不涉及先验地选择特定的相关化合物,而是侧重于确定真品牛至与最常见掺假物的光谱特征。近红外超光谱成像(NIR-HSI)是一种先进、快速和非破坏性的技术,可以收集样品的光谱和空间信息,因此特别适用于描述视觉异质样品的特征。然而,在应用 SIMCA 时,真品牛至类别内的高变异性和异质性导致结果不佳。作为替代方案,我们采用了软偏最小二乘法判别分析(Soft PLS-DA)算法来区分牛至真品和纯掺假品。软偏最小二乘判别分析是一种混合方法,结合了判别技术和类别建模技术的优点。由此产生的分类模型确实取得了可喜的成果,预测效率达到 92.9%。最后,根据软 PLS-DA 预测图像中被预测为牛至的像素百分比,确定了 10% 的阈值,作为近红外-高光谱仪的检测限,以区分真假牛至样品。
{"title":"Addressing adulteration challenges of dried oregano leaves by NIR HyperSpectral Imaging","authors":"Veronica Ferrari ,&nbsp;Rosalba Calvini ,&nbsp;Camilla Menozzi ,&nbsp;Alessandro Ulrici ,&nbsp;Marco Bragolusi ,&nbsp;Roberto Piro ,&nbsp;Alessandra Tata ,&nbsp;Michele Suman ,&nbsp;Giorgia Foca","doi":"10.1016/j.chemolab.2024.105133","DOIUrl":"https://doi.org/10.1016/j.chemolab.2024.105133","url":null,"abstract":"<div><p>Dried oregano leaves are particularly prone to adulteration because of their widespread distribution and their easy mixing with leaves of other plants of lower commercial value, such as olive, myrtle, strawberry tree, or sumac. To reveal the presence of adulteration, in this study we considered an untargeted analytical approach, which instead of involving the <em>a priori</em> selection of specific compounds of interest is focused on defining the characteristic spectral signature of authentic oregano with respect to its most frequent adulterants. NIR HyperSpectral Imaging (NIR-HSI) represents a state-of-the-art, rapid and non-destructive technique, allowing for the collection of both spectral and spatial information from the sample, making it particularly suitable for characterizing visually heterogeneous samples.</p><p>Authentication issues are typically assessed through class modelling techniques and Soft Independent Modelling of class Analogy (SIMCA) is one of the most used algorithms in this scenario. However, the high variability and heterogeneity within the authentic oregano class resulted in poor outcomes when SIMCA was applied. As an alternative, Soft Partial Least Squares Discriminant Analysis (Soft PLS-DA) algorithm was applied to differentiate authentic oregano samples from pure adulterants. Soft PLS-DA represents a hybrid approach that combines the advantages of both discriminant and class modelling techniques. The resultant classification model has indeed led to promising results, achieving a prediction efficiency of 92.9 %. Finally, based on the percentage of pixels predicted as oregano in the Soft-PLSDA prediction images, a threshold value of 10 % was established, serving as a detection limit of NIR-HSI to distinguish authentic oregano samples from adulterated ones.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"249 ","pages":"Article 105133"},"PeriodicalIF":3.9,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S016974392400073X/pdfft?md5=9ca1205b6902ee41304da3031bdead5a&pid=1-s2.0-S016974392400073X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140640785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Chemometrics and Intelligent Laboratory Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1