首页 > 最新文献

Journal of Chemometrics最新文献

英文 中文
Resampling as a Robust Measure of Model Complexity in PARAFAC Models 将重采样作为 PARAFAC 模型复杂性的稳健衡量标准
IF 2.4 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-09-05 DOI: 10.1002/cem.3601
Helene Fog Froriep Halberg, Marta Bevilacqua, Åsmund Rinnan
Fluorescence spectroscopy has been applied for analysis of complex samples, such as food and beverages. Parallel factor analysis (PARAFAC) is a well‐known decomposition method for fluorescence excitation–emission matrices (EEMs). When the complexity of the system increases, it becomes considerably more difficult to determine the optimal number of PARAFAC components, especially when the fluorophores of the system are unknown. The two commonly applied diagnostics, core consistency and split‐half analysis, appear to underestimate the model complexity due to covarying components and local minima, respectively. As a more robust alternative, we propose a resampling approach with multiple initializations and submodel comparisons for estimating the optimal number of PARAFAC components in complex data.
荧光光谱法已被用于分析食品和饮料等复杂样品。平行因子分析(PARAFAC)是一种著名的荧光激发-发射矩阵(EEM)分解方法。当系统的复杂性增加时,确定 PARAFAC 成分的最佳数量就变得相当困难,尤其是当系统中的荧光团未知时。两种常用的诊断方法--核心一致性和分割半分析--似乎分别由于共变成分和局部最小值而低估了模型的复杂性。作为一种更稳健的替代方法,我们提出了一种具有多重初始化和子模型比较的重采样方法,用于估计复杂数据中 PARAFAC 成分的最佳数量。
{"title":"Resampling as a Robust Measure of Model Complexity in PARAFAC Models","authors":"Helene Fog Froriep Halberg, Marta Bevilacqua, Åsmund Rinnan","doi":"10.1002/cem.3601","DOIUrl":"https://doi.org/10.1002/cem.3601","url":null,"abstract":"Fluorescence spectroscopy has been applied for analysis of complex samples, such as food and beverages. Parallel factor analysis (PARAFAC) is a well‐known decomposition method for fluorescence excitation–emission matrices (EEMs). When the complexity of the system increases, it becomes considerably more difficult to determine the optimal number of PARAFAC components, especially when the fluorophores of the system are unknown. The two commonly applied diagnostics, core consistency and split‐half analysis, appear to underestimate the model complexity due to covarying components and local minima, respectively. As a more robust alternative, we propose a resampling approach with multiple initializations and submodel comparisons for estimating the optimal number of PARAFAC components in complex data.","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Non‐Linear Model for Multiple Alcohol Intakes and Optimal Designs Strategies 多种酒精摄入量的非线性模型和最佳设计策略
IF 2.4 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-08-27 DOI: 10.1002/cem.3599
Irene Mariñas‐Collado, Juan M. Rodríguez‐Díaz, M. Teresa Santos‐Martín
This study addresses the complex dynamics of alcohol elimination in the human body, very important in forensic and healthcare areas. Existing models often oversimplify with the assumption of linear elimination kinetics, limiting practical application. This study presents a novel non‐linear model for estimating blood alcohol concentration after multiple intakes. Initially developed for two different alcohol incorporations, it can be straightforwardly extended to the case of more intakes. Emphasising the significance of accurate parameter estimation, the research underscores the importance of precise experimental design, utilising optimal experimental design (OED) methodologies. Sensitivity analysis of model coefficients and the determination of D‐optimal designs, considering correlation structures among observations, reveal a strong linear relationship between support points. This relationship can be used to obtain nearly optimal designs that are highly efficient and much easier to compute.
这项研究探讨了酒精在人体内消除的复杂动态,这在法医和医疗保健领域非常重要。现有的模型往往过于简化,假定其为线性消除动力学,从而限制了实际应用。本研究提出了一种新的非线性模型,用于估计多次摄入后血液中的酒精浓度。该模型最初是针对两种不同的酒精摄入量而开发的,可以直接扩展到更多摄入量的情况。研究强调了精确参数估计的重要性,并强调了利用最优实验设计(OED)方法进行精确实验设计的重要性。对模型系数的敏感性分析和 D-最优设计的确定,考虑到了观测数据之间的相关结构,揭示了支持点之间强烈的线性关系。利用这种关系可以获得近乎最优的设计,这种设计效率高,而且更容易计算。
{"title":"A Non‐Linear Model for Multiple Alcohol Intakes and Optimal Designs Strategies","authors":"Irene Mariñas‐Collado, Juan M. Rodríguez‐Díaz, M. Teresa Santos‐Martín","doi":"10.1002/cem.3599","DOIUrl":"https://doi.org/10.1002/cem.3599","url":null,"abstract":"This study addresses the complex dynamics of alcohol elimination in the human body, very important in forensic and healthcare areas. Existing models often oversimplify with the assumption of linear elimination kinetics, limiting practical application. This study presents a novel non‐linear model for estimating blood alcohol concentration after multiple intakes. Initially developed for two different alcohol incorporations, it can be straightforwardly extended to the case of more intakes. Emphasising the significance of accurate parameter estimation, the research underscores the importance of precise experimental design, utilising optimal experimental design (OED) methodologies. Sensitivity analysis of model coefficients and the determination of D‐optimal designs, considering correlation structures among observations, reveal a strong linear relationship between support points. This relationship can be used to obtain nearly optimal designs that are highly efficient and much easier to compute.","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Population Power Curves in ASCA With Permutation Testing 带有置换测试的 ASCA 人口功率曲线
IF 2.4 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-08-27 DOI: 10.1002/cem.3596
José Camacho, Michael Sorochan Armstrong
In this paper, we revisit the power curves in ANOVA simultaneous component analysis (ASCA) based on permutation testing and introduce the population curves derived from population parameters describing the relative effect among factors and interactions. The relative effect has important practical implications: The statistical power of a given factor depends on the design of other factors in the experiment and not only of the sample size. Thus, understanding the relative power in a specific experimental design can be extremely useful to maximize our capability of success when planning the experiment. In the paper, we derive relative and absolute population curves, where the former represent statistical power in terms of the normalized effect size between structure and noise, and the latter in terms of the sample size. Both types of population curves allow us to make decisions regarding the number and nature (fixed/random) of factors, their relationships (crossed/nested), and the number of levels and replicates, among others, in an multivariate experimental design (e.g., an omics study) during the planning phase of the experiment. We illustrate both types of curves through simulation.
在本文中,我们重新审视了基于置换检验的方差分析同时成分分析(ASCA)中的功率曲线,并引入了由描述因子间和交互作用间相对效应的群体参数导出的群体曲线。相对效应具有重要的实际意义:给定因素的统计能力取决于实验中其他因素的设计,而不仅仅是样本量。因此,了解特定实验设计中的相对效应对于我们在规划实验时最大限度地提高成功率非常有用。在本文中,我们推导了相对和绝对群体曲线,前者以结构和噪声之间的归一化效应大小表示统计能力,后者以样本量表示统计能力。这两类种群曲线都能让我们在实验计划阶段,就多元实验设计(如 omics 研究)中因子的数量和性质(固定/随机)、它们之间的关系(交叉/嵌套)、水平和重复的数量等做出决策。我们通过模拟来说明这两种类型的曲线。
{"title":"Population Power Curves in ASCA With Permutation Testing","authors":"José Camacho, Michael Sorochan Armstrong","doi":"10.1002/cem.3596","DOIUrl":"https://doi.org/10.1002/cem.3596","url":null,"abstract":"In this paper, we revisit the power curves in ANOVA simultaneous component analysis (ASCA) based on permutation testing and introduce the population curves derived from population parameters describing the relative effect among factors and interactions. The relative effect has important practical implications: The statistical power of a given factor depends on the design of other factors in the experiment and not only of the sample size. Thus, understanding the relative power in a specific experimental design can be extremely useful to maximize our capability of success when planning the experiment. In the paper, we derive relative and absolute population curves, where the former represent statistical power in terms of the normalized effect size between structure and noise, and the latter in terms of the sample size. Both types of population curves allow us to make decisions regarding the number and nature (fixed/random) of factors, their relationships (crossed/nested), and the number of levels and replicates, among others, in an multivariate experimental design (e.g., an omics study) during the planning phase of the experiment. We illustrate both types of curves through simulation.","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chemometric Classification of Motor Oils Using 1H NMR Spectroscopy With Simultaneous Phase and Baseline Optimization 利用 1H NMR 光谱对机油进行化学计量分类,同时进行相位和基线优化
IF 2.4 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-08-26 DOI: 10.1002/cem.3598
A. Olejniczak, J. P. Łukaszewicz
Here, we demonstrate mid‐field 1H NMR spectroscopy combined with chemometrics to be powerful in the classification and authentication of motor oils (MOs). The 1H NMR data were processed with a new algorithm for simultaneous phase and baseline correction, which, for crowded spectra such as those of the refinery products, allowed for more accurate estimation of phase parameters than other literature approaches tested. A principal component analysis (PCA) model based on the unbinned CH3 fingerprint region (0.6–1.0 ppm) enabled the differentiation of hydrocracked and poly‐α‐olefin‐based MOs and was effective in resolving mixtures of these base stocks with conventional base oils. PCA analysis of the 1.0‐ to 1.14‐ppm region enabled the detection of poly (isobutylene) additive and was useful for differentiating between single‐grade and multigrade MOs. Non‐equidistantly binned 1H NMR data were used to detect the addition of esters and to establish discriminant models for classifying MOs by viscosity grade and by major categories of synthetic, semisynthetic, and mineral oils. The performances of four classifiers (linear discriminant analysis [LDA], quadratic discriminant analysis [QDA], naïve Bayes classifier [NBC], and support vector machine [SVM]) with and without PCA dimensionality reduction were compared. In both tasks, SVM showed the best efficiency, with average error rates of ~2.3% and 8.15% for predicting major MO categories and viscosity grades, respectively. The potential to merge spectra collected from different NMR instruments is discussed for models based on spectral binning. It is also shown that small errors in phase parameters are not detrimental to binning‐based PCA models.
在此,我们展示了中场 1H NMR 光谱与化学计量学相结合在机油 (MO) 分类和鉴定方面的强大功能。1H NMR 数据采用一种新算法进行处理,该算法可同时进行相位和基线校正,对于炼油厂产品等拥挤的光谱,该算法能比测试过的其他文献方法更准确地估计相位参数。基于未分馏 CH3 指纹区域(0.6-1.0 ppm)的主成分分析 (PCA) 模型能够区分加氢裂化 MO 和基于聚-α-烯烃的 MO,并能有效分辨这些基础油与传统基础油的混合物。通过对 1.0 至 1.14ppm 区域进行 PCA 分析,可以检测到聚(异丁烯)添加剂,并有助于区分单级和多级 MO。非流体分级 1H NMR 数据用于检测酯类的添加情况,并建立了按粘度等级以及合成油、半合成油和矿物油的主要类别对 MO 进行分类的判别模型。比较了四种分类器(线性判别分析器 [LDA]、二次判别分析器 [QDA]、奈夫贝叶斯分类器 [NBC] 和支持向量机 [SVM])在使用和未使用 PCA 降维的情况下的性能。在这两项任务中,SVM 的效率最高,预测主要 MO 类别和粘度等级的平均错误率分别为 ~2.3% 和 8.15%。对于基于光谱分选的模型,讨论了合并从不同 NMR 仪器收集的光谱的可能性。研究还表明,相位参数的微小误差不会对基于分选的 PCA 模型造成损害。
{"title":"Chemometric Classification of Motor Oils Using 1H NMR Spectroscopy With Simultaneous Phase and Baseline Optimization","authors":"A. Olejniczak, J. P. Łukaszewicz","doi":"10.1002/cem.3598","DOIUrl":"https://doi.org/10.1002/cem.3598","url":null,"abstract":"Here, we demonstrate mid‐field <jats:sup>1</jats:sup>H NMR spectroscopy combined with chemometrics to be powerful in the classification and authentication of motor oils (MOs). The <jats:sup>1</jats:sup>H NMR data were processed with a new algorithm for simultaneous phase and baseline correction, which, for crowded spectra such as those of the refinery products, allowed for more accurate estimation of phase parameters than other literature approaches tested. A principal component analysis (PCA) model based on the unbinned CH<jats:sub>3</jats:sub> fingerprint region (0.6–1.0 ppm) enabled the differentiation of hydrocracked and poly‐α‐olefin‐based MOs and was effective in resolving mixtures of these base stocks with conventional base oils. PCA analysis of the 1.0‐ to 1.14‐ppm region enabled the detection of poly (isobutylene) additive and was useful for differentiating between single‐grade and multigrade MOs. Non‐equidistantly binned <jats:sup>1</jats:sup>H NMR data were used to detect the addition of esters and to establish discriminant models for classifying MOs by viscosity grade and by major categories of synthetic, semisynthetic, and mineral oils. The performances of four classifiers (linear discriminant analysis [LDA], quadratic discriminant analysis [QDA], naïve Bayes classifier [NBC], and support vector machine [SVM]) with and without PCA dimensionality reduction were compared. In both tasks, SVM showed the best efficiency, with average error rates of ~2.3% and 8.15% for predicting major MO categories and viscosity grades, respectively. The potential to merge spectra collected from different NMR instruments is discussed for models based on spectral binning. It is also shown that small errors in phase parameters are not detrimental to binning‐based PCA models.","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Some Views on Multi‐criteria Methods for Data Analysis 关于数据分析多标准方法的一些观点
IF 2.4 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-08-22 DOI: 10.1002/cem.3597
Henk A. L. Kiers, Marieke E. Timmerman
Many data analysis methods actually combine optimization of several criteria. In this paper, a framework is offered for categorizing such multi‐criteria methods. In particular, it categorizes multiset and three‐way analysis methods as well as penalized methods and combinations thereof. The framework aims to stimulate critical evaluation of methods and reflection on the purpose of methods and, by signaling gaps, to help the development of new data analysis methods.
许多数据分析方法实际上结合了多个标准的优化。本文为此类多标准方法的分类提供了一个框架。特别是,它对多集合和三向分析方法以及惩罚性方法及其组合进行了分类。该框架旨在激发对方法的批判性评估和对方法目的的思考,并通过指出差距,帮助开发新的数据分析方法。
{"title":"Some Views on Multi‐criteria Methods for Data Analysis","authors":"Henk A. L. Kiers, Marieke E. Timmerman","doi":"10.1002/cem.3597","DOIUrl":"https://doi.org/10.1002/cem.3597","url":null,"abstract":"Many data analysis methods actually combine optimization of several criteria. In this paper, a framework is offered for categorizing such multi‐criteria methods. In particular, it categorizes multiset and three‐way analysis methods as well as penalized methods and combinations thereof. The framework aims to stimulate critical evaluation of methods and reflection on the purpose of methods and, by signaling gaps, to help the development of new data analysis methods.","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Can Angle Measures Be Useful in MCR Analyses? 角度测量在 MCR 分析中有用吗?
IF 2.4 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-08-15 DOI: 10.1002/cem.3582
Klaus Neymeyr, Martina Beese, Hamid Abdollahi, Mathias Sawall
In MCR analyses, the similarity of pairs of spectra or concentration profiles can be measured in terms of the acute angle that is enclosed by the representing vectors. Acute angles between vectors can be generalized to pairs of subspaces. So‐called canonical angles, also called principal angles, measure the mutual orientation of a pair of subspaces. This work discusses how angles and canonical angles can support multivariate curve resolution analyses. A canonical angle analysis (CAA) can help to detect changes of the chemical composition during a chemical reaction in a way comparable, but different to the evolving factor analysis (EFA).
在 MCR 分析中,光谱或浓度曲线对的相似性可以用代表向量所围成的锐角来衡量。矢量之间的锐角可以推广到子空间对。所谓的典型角(也称为主角)可以测量一对子空间的相互方向。本研究将讨论角度和典型角度如何支持多元曲线解析分析。典型角分析 (CAA) 可以帮助检测化学反应过程中化学成分的变化,其方法与演化因子分析 (EFA) 类似,但又有所不同。
{"title":"Can Angle Measures Be Useful in MCR Analyses?","authors":"Klaus Neymeyr, Martina Beese, Hamid Abdollahi, Mathias Sawall","doi":"10.1002/cem.3582","DOIUrl":"https://doi.org/10.1002/cem.3582","url":null,"abstract":"In MCR analyses, the similarity of pairs of spectra or concentration profiles can be measured in terms of the acute angle that is enclosed by the representing vectors. Acute angles between vectors can be generalized to pairs of subspaces. So‐called canonical angles, also called principal angles, measure the mutual orientation of a pair of subspaces. This work discusses how angles and canonical angles can support multivariate curve resolution analyses. A canonical angle analysis (CAA) can help to detect changes of the chemical composition during a chemical reaction in a way comparable, but different to the evolving factor analysis (EFA).","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Flexible Trilinearity Alignment (FTA) and Shift Invariant Transformation (SIT) Constraints in Three‐Way Multivariate Curve Resolution Data Analysis 三向多元曲线解析数据分析中的灵活三线性对齐(FTA)和位移不变变换(SIT)约束条件
IF 2.3 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-08-08 DOI: 10.1002/cem.3581
Xin Zhang, R. Tauler
In this work, two alternative ways of analyzing three‐way data with multivariate curve resolution alternating least squares (MCR‐ALS) using the trilinearity constraint are described and compared. Different synthetic datasets and experimental three‐way datasets covering different scenarios are analyzed, and the results obtained are compared. The two new different ways of applying the trilinearity constraint are named flexible trilinearity alignment (FTA) and shift invariant transformation (SIT). The effects of noise in the application of both types of constraints are investigated in detail. Results show that both approaches are particularly adequate for those cases like in gas chromatography and especially in liquid chromatography where the elution profiles of the same chemical component in different chromatographic runs are not totally reproducible because they are time shifted, although they preserve their shape. When strong time shifts and co‐elution occur, then the “standard” trilinear model does not work, and alternative approaches should be used, such as the MCR extended bilinear model to multiset (multirun) data, or the proposed relaxation of the trilinearity constraint in the FTA and SIT methods to capture the time drift changes produced in the elution profiles of the resolved components.
在这项工作中,描述并比较了使用三线性约束的多变量曲线分辨率交替最小二乘法(MCR-ALS)分析三向数据的两种替代方法。对涵盖不同场景的不同合成数据集和实验三向数据集进行了分析,并对所得结果进行了比较。应用三线性约束的两种新的不同方法被命名为灵活三线性配准(FTA)和移位不变变换(SIT)。在应用这两种约束时,对噪声的影响进行了详细研究。结果表明,这两种方法都特别适用于气相色谱法,尤其是液相色谱法中同一化学成分在不同色谱运行中的洗脱剖面图虽然形状保持不变,但由于时间偏移而无法完全重现的情况。当发生强烈的时间偏移和共洗脱时,"标准 "三线性模型就不起作用了,此时应采用其他方法,如针对多集(多运行)数据的 MCR 扩展双线性模型,或建议放宽 FTA 和 SIT 方法中的三线性约束,以捕捉已解析组分洗脱剖面中产生的时间漂移变化。
{"title":"Flexible Trilinearity Alignment (FTA) and Shift Invariant Transformation (SIT) Constraints in Three‐Way Multivariate Curve Resolution Data Analysis","authors":"Xin Zhang, R. Tauler","doi":"10.1002/cem.3581","DOIUrl":"https://doi.org/10.1002/cem.3581","url":null,"abstract":"In this work, two alternative ways of analyzing three‐way data with multivariate curve resolution alternating least squares (MCR‐ALS) using the trilinearity constraint are described and compared. Different synthetic datasets and experimental three‐way datasets covering different scenarios are analyzed, and the results obtained are compared. The two new different ways of applying the trilinearity constraint are named flexible trilinearity alignment (FTA) and shift invariant transformation (SIT). The effects of noise in the application of both types of constraints are investigated in detail. Results show that both approaches are particularly adequate for those cases like in gas chromatography and especially in liquid chromatography where the elution profiles of the same chemical component in different chromatographic runs are not totally reproducible because they are time shifted, although they preserve their shape. When strong time shifts and co‐elution occur, then the “standard” trilinear model does not work, and alternative approaches should be used, such as the MCR extended bilinear model to multiset (multirun) data, or the proposed relaxation of the trilinearity constraint in the FTA and SIT methods to capture the time drift changes produced in the elution profiles of the resolved components.","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141928166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Multiplicative Scatter Correction Using Quantile Regression 利用定量回归进行稳健的乘法散点校正
IF 2.4 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-08-06 DOI: 10.1002/cem.3589
Bahram Hemmateenejad, Nabiollah Mobaraki, Knut Baumann
A robust method for multiplicative scatter correction (MSC) in infrared spectroscopy is presented. Using quantile regression, the outlier wavelengths (concentration‐dependent wavelengths) that are irrelevant to the regression are identified and therefore excluded from the regression model. This new MCS method, which could be implemented in its simple or extended form, is much simpler than the recently proposed methods and has only one hyperparameter (the quantile value) to be adjusted. To achieve this, a scoring function based on residual analysis can automatically determine the correct quantile value. The method is first explained using simulation data sets and then its validation is explained by analysing some experimental data sets. It was found that our new method can perform well in the presence of strong outlying variables. On the other hand, when the data sets are not associated outlying wavelengths, this method behaves similarly to the conventional MSC method.
本文介绍了一种用于红外光谱乘法散射校正(MSC)的稳健方法。通过使用量子回归,可以识别出与回归无关的离群波长(与浓度相关的波长),从而将其排除在回归模型之外。这种新的 MCS 方法可以以简单或扩展的形式实现,比最近提出的方法简单得多,而且只需调整一个超参数(量值)。为此,基于残差分析的评分函数可以自动确定正确的量化值。首先使用模拟数据集对该方法进行了说明,然后通过分析一些实验数据集对其进行了验证。结果发现,我们的新方法在存在强离群变量的情况下表现良好。另一方面,当数据集与离群波长无关时,这种方法的表现与传统的 MSC 方法类似。
{"title":"Robust Multiplicative Scatter Correction Using Quantile Regression","authors":"Bahram Hemmateenejad, Nabiollah Mobaraki, Knut Baumann","doi":"10.1002/cem.3589","DOIUrl":"https://doi.org/10.1002/cem.3589","url":null,"abstract":"A robust method for multiplicative scatter correction (MSC) in infrared spectroscopy is presented. Using quantile regression, the outlier wavelengths (concentration‐dependent wavelengths) that are irrelevant to the regression are identified and therefore excluded from the regression model. This new MCS method, which could be implemented in its simple or extended form, is much simpler than the recently proposed methods and has only one hyperparameter (the quantile value) to be adjusted. To achieve this, a scoring function based on residual analysis can automatically determine the correct quantile value. The method is first explained using simulation data sets and then its validation is explained by analysing some experimental data sets. It was found that our new method can perform well in the presence of strong outlying variables. On the other hand, when the data sets are not associated outlying wavelengths, this method behaves similarly to the conventional MSC method.","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Novel One‐Class Convolutional Autoencoder Combined With Excitation–Emission Matrix Fluorescence Spectroscopy for Authenticity Identification of Food 新型单类卷积自动编码器与激发-发射矩阵荧光光谱技术相结合用于食品真伪鉴别
IF 2.4 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-08-05 DOI: 10.1002/cem.3592
Xiaoqin Yan, Baoshuo Jia, Wanjun Long, Kun Huang, Tong Wang, Hailong Wu, Ruqin Yu
In this work, a novel one‐class classification algorithm one‐class convolutional autoencoder (OC‐CAE) was proposed for the detection of abnormal samples in the excitation–emission matrix (EEM) fluorescence spectra dataset. The OC‐CAE used Boxplot to analyze the reconstruction errors and used the LOF algorithm to handle features extracted by the hidden layer in the convolutional autoencoder (CAE). The fused information provides the basis for more accurate pattern recognition, ensures flexibility in model training, and can obtain higher model specificity, which is important in the field of food quality control. To demonstrate the reliability and advantages of OC‐CAE, two EEM cases related to the authentication of food including the Zhenjiang aromatic vinegar (ZAV) case and the camellia oil (CAO) case were studied. The results showed that OC‐CAE identified all abnormal samples in the two cases, reflecting excellent performance in the detection of abnormal samples, and that it, coupled with EEM, would be an effective tool for the authenticity identification of food.
本研究提出了一种新型的一类分类算法一类卷积自动编码器(OC-CAE),用于检测激发-发射矩阵(EEM)荧光光谱数据集中的异常样本。OC-CAE 使用 Boxplot 分析重构误差,并使用 LOF 算法处理卷积自动编码器 (CAE) 隐藏层提取的特征。融合后的信息为更精确的模式识别提供了基础,确保了模型训练的灵活性,并能获得更高的模型特异性,这在食品质量控制领域非常重要。为了证明 OC-CAE 的可靠性和优势,研究了两个与食品认证相关的 EEM 案例,包括镇江香醋(ZAV)案例和山茶油(CAO)案例。结果表明,OC-CAE 能识别这两个案例中的所有异常样品,在检测异常样品方面表现出色,与 EEM 相结合将成为食品真伪鉴定的有效工具。
{"title":"A Novel One‐Class Convolutional Autoencoder Combined With Excitation–Emission Matrix Fluorescence Spectroscopy for Authenticity Identification of Food","authors":"Xiaoqin Yan, Baoshuo Jia, Wanjun Long, Kun Huang, Tong Wang, Hailong Wu, Ruqin Yu","doi":"10.1002/cem.3592","DOIUrl":"https://doi.org/10.1002/cem.3592","url":null,"abstract":"In this work, a novel one‐class classification algorithm one‐class convolutional autoencoder (OC‐CAE) was proposed for the detection of abnormal samples in the excitation–emission matrix (EEM) fluorescence spectra dataset. The OC‐CAE used Boxplot to analyze the reconstruction errors and used the LOF algorithm to handle features extracted by the hidden layer in the convolutional autoencoder (CAE). The fused information provides the basis for more accurate pattern recognition, ensures flexibility in model training, and can obtain higher model specificity, which is important in the field of food quality control. To demonstrate the reliability and advantages of OC‐CAE, two EEM cases related to the authentication of food including the Zhenjiang aromatic vinegar (ZAV) case and the camellia oil (CAO) case were studied. The results showed that OC‐CAE identified all abnormal samples in the two cases, reflecting excellent performance in the detection of abnormal samples, and that it, coupled with EEM, would be an effective tool for the authenticity identification of food.","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adjusted Pareto Scaling for Multivariate Calibration Models 多变量校准模型的调整帕累托缩放法
IF 2.4 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-08-03 DOI: 10.1002/cem.3588
Kurt Varmuza, Peter Filzmoser
The performance of multivariate calibration models ŷ = f(x) for the prediction of a numerical property y from a set of x‐variables depends on the type of scaling of the x‐variables. Common scaling methods are autoscaling (dividing the centered x by its standard deviation s) and Pareto scaling (dividing the centered x by sP with p = 0.5). The adjusted Pareto scaling presented here varies the exponent P between 0 (no scaling) and 1 (autoscaling) with the aim of obtaining an optimum prediction performance for ŷ. Related scaling methods based on the variable spread are range scaling and vast scaling; while level scaling is based on the location (central value) of the variable. These scaling methods and robust versions are compared for models created by partial least‐squares (PLS) regression. The applied strategy repeated double cross validation (rdCV) evaluates the model performance for test set objects and considers its variability. Results with three data sets from chemistry show: (a) the efficacy of the different scaling methods depends on the data structure; (b) optimization of the Pareto exponent P is recommended; (c) range scaling or vast scaling may be better than adjusted Pareto scaling; (d) in general a heuristic search for the best scaling method is advisable. Overall, the consideration of different variants of scaling allow for a flexible adjustment of the variable contributions to the calibration model.
多元校准模型 ŷ = f(x)从一组 x 变量预测数值属性 y 的性能取决于 x 变量的缩放类型。常见的缩放方法有自动缩放(将中心 x 除以标准偏差 s)和帕累托缩放(将中心 x 除以 sP,p = 0.5)。本文介绍的调整帕累托缩放法在 0(无缩放)和 1(自动缩放)之间改变指数 P,目的是获得 ŷ 的最佳预测性能。基于变量分布的相关缩放方法有范围缩放和广度缩放;而水平缩放则基于变量的位置(中心值)。通过偏最小二乘(PLS)回归创建的模型,对这些缩放方法和稳健版本进行了比较。所采用的重复双重交叉验证(rdCV)策略可评估测试集对象的模型性能,并考虑其可变性。三个化学数据集的结果表明:(a) 不同缩放方法的效果取决于数据结构;(b) 建议优化帕累托指数 P;(c) 范围缩放或广域缩放可能比调整后的帕累托缩放更好;(d) 一般来说,最好采用启发式搜索最佳缩放方法。总之,考虑不同的缩放变量可以灵活调整校准模型的变量贡献。
{"title":"Adjusted Pareto Scaling for Multivariate Calibration Models","authors":"Kurt Varmuza, Peter Filzmoser","doi":"10.1002/cem.3588","DOIUrl":"https://doi.org/10.1002/cem.3588","url":null,"abstract":"The performance of multivariate calibration models <jats:italic>ŷ</jats:italic> = f(<jats:italic>x</jats:italic>) for the prediction of a numerical property <jats:italic>y</jats:italic> from a set of <jats:italic>x</jats:italic>‐variables depends on the type of scaling of the <jats:italic>x</jats:italic>‐variables. Common scaling methods are autoscaling (dividing the centered <jats:italic>x</jats:italic> by its standard deviation <jats:italic>s</jats:italic>) and Pareto scaling (dividing the centered <jats:italic>x</jats:italic> by <jats:italic>s</jats:italic><jats:sup><jats:italic>P</jats:italic></jats:sup> with <jats:italic>p</jats:italic> = 0.5). The adjusted Pareto scaling presented here varies the exponent <jats:italic>P</jats:italic> between 0 (no scaling) and 1 (autoscaling) with the aim of obtaining an optimum prediction performance for <jats:italic>ŷ</jats:italic>. Related scaling methods based on the variable spread are range scaling and vast scaling; while level scaling is based on the location (central value) of the variable. These scaling methods and robust versions are compared for models created by partial least‐squares (PLS) regression. The applied strategy repeated double cross validation (rdCV) evaluates the model performance for test set objects and considers its variability. Results with three data sets from chemistry show: (a) the efficacy of the different scaling methods depends on the data structure; (b) optimization of the Pareto exponent <jats:italic>P</jats:italic> is recommended; (c) range scaling or vast scaling may be better than adjusted Pareto scaling; (d) in general a heuristic search for the best scaling method is advisable. Overall, the consideration of different variants of scaling allow for a flexible adjustment of the variable contributions to the calibration model.","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Chemometrics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1