首页 > 最新文献

Journal of Chemometrics最新文献

英文 中文
Industrial Process Fault Detection Based on IGA-Combinatorial Model Decision Mechanism 基于 IGA 组合模型决策机制的工业流程故障检测
IF 2.3 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-10-05 DOI: 10.1002/cem.3602
Shujuan Wei, Yongsheng Qi, Liqiang Liu, Yongting Li, Xuejin Gao

To address the challenges of extracting features from complex industrial process data, the reliance of numerous fault detection methodologies on presupposed data distribution types, and the limited generalization capacity of fault detection, this manuscript introduces a sophisticated algorithm for industrial process fault detection. This algorithm harnesses the information gain adaptive (IGA) technique for feature selection and a synergistic model decision mechanism. Initially, the process involves the computation of information gain via decision trees, coupled with the determination of the k$$ k $$ value through cross-validation. This strategy enables the adaptive selection of features, thereby facilitating data dimensionality reduction and effective feature extraction. The subsequent phase introduces a ternary statistical measure monitoring group for the detection of linear faults, while autoencoders and one-class SVM methodologies are applied for the monitoring of nonlinear faults. The culmination of this approach is the development of an innovative weighted decision mechanism, designed to amalgamate the findings from both linear and nonlinear detection avenues, yielding more dependable detection results. The validation of this algorithm employs datasets from the water chillers process and Tennessee Eastman (TE) process, demonstrating the IGA-combined model's superior performance over isolated linear or nonlinear detection algorithms in terms of detection accuracy and robustness. Notably, the efficacy of this method is not contingent upon specific assumptions regarding data distribution, rendering it a versatile and efficacious tool for the fault detection in industrial processes.

为了解决从复杂的工业过程数据中提取特征的挑战,许多故障检测方法依赖于预设的数据分布类型,以及故障检测的有限泛化能力,本文介绍了一种复杂的工业过程故障检测算法。该算法利用信息增益自适应(IGA)技术进行特征选择,并采用协同模型决策机制。最初,该过程包括通过决策树计算信息增益,以及通过交叉验证确定k $$ k $$值。该策略能够实现特征的自适应选择,从而有利于数据降维和有效的特征提取。随后引入三元统计测度监测组来检测线性故障,而采用自编码器和一类支持向量机方法来监测非线性故障。这种方法的高潮是一种创新的加权决策机制的发展,旨在合并线性和非线性检测途径的发现,产生更可靠的检测结果。该算法的验证使用了来自冷水机组过程和田纳西伊士曼(TE)过程的数据集,证明iga组合模型在检测精度和鲁棒性方面优于孤立的线性或非线性检测算法。值得注意的是,该方法的有效性不依赖于有关数据分布的特定假设,使其成为工业过程中故障检测的通用有效工具。
{"title":"Industrial Process Fault Detection Based on IGA-Combinatorial Model Decision Mechanism","authors":"Shujuan Wei,&nbsp;Yongsheng Qi,&nbsp;Liqiang Liu,&nbsp;Yongting Li,&nbsp;Xuejin Gao","doi":"10.1002/cem.3602","DOIUrl":"https://doi.org/10.1002/cem.3602","url":null,"abstract":"<div>\u0000 \u0000 <p>To address the challenges of extracting features from complex industrial process data, the reliance of numerous fault detection methodologies on presupposed data distribution types, and the limited generalization capacity of fault detection, this manuscript introduces a sophisticated algorithm for industrial process fault detection. This algorithm harnesses the information gain adaptive (IGA) technique for feature selection and a synergistic model decision mechanism. Initially, the process involves the computation of information gain via decision trees, coupled with the determination of the \u0000<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>k</mi>\u0000 </mrow>\u0000 <annotation>$$ k $$</annotation>\u0000 </semantics></math> value through cross-validation. This strategy enables the adaptive selection of features, thereby facilitating data dimensionality reduction and effective feature extraction. The subsequent phase introduces a ternary statistical measure monitoring group for the detection of linear faults, while autoencoders and one-class SVM methodologies are applied for the monitoring of nonlinear faults. The culmination of this approach is the development of an innovative weighted decision mechanism, designed to amalgamate the findings from both linear and nonlinear detection avenues, yielding more dependable detection results. The validation of this algorithm employs datasets from the water chillers process and Tennessee Eastman (TE) process, demonstrating the IGA-combined model's superior performance over isolated linear or nonlinear detection algorithms in terms of detection accuracy and robustness. Notably, the efficacy of this method is not contingent upon specific assumptions regarding data distribution, rendering it a versatile and efficacious tool for the fault detection in industrial processes.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142860415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatial–Temporal Deviation Analysis for Multivariate Statistical Process Monitoring 多变量统计过程监控的时空偏差分析
IF 2.3 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-10-02 DOI: 10.1002/cem.3611
Meng Wang, Chudong Tong, Feng Xu, Lijia Luo

Given that an effective process monitoring implementation should take both the spatial and temporal variations into account, a novel online process monitoring scheme based on a newly formulated algorithm titled as spatial–temporal deviation analysis (STDA) is proposed. Different from the mainstream process monitoring methods that focus on characterizing the spatial and/or temporal variation in the historical normal samples, the proposed STDA algorithm is designed to adaptively and timely train a pair of projecting vectors to uncover potential deviation in the spatial–temporal variation of online monitored samples, so as to guarantee consistently enhanced monitoring performance. Instead of utilizing a fixed projecting framework trained offline, the STDA algorithm is repeatedly executed once a newly measured sample become available for online monitoring. Therefore, the proposed STDA-based method could consistently ensure its effectiveness for online fault detection, because a projecting framework targeted to revealing deviation in spatial–temporal variation is dynamically determined for different online monitoring samples in a timely manner. Finally, the salient monitoring performance achieved by the proposed STDA-based approach is evaluated through comparisons with other counterparts.

考虑到有效的过程监测需要同时考虑空间和时间变化,提出了一种基于时空偏差分析(STDA)算法的在线过程监测方案。与主流过程监测方法侧重于表征历史正常样本的时空变化不同,本文提出的STDA算法旨在自适应、及时地训练一对投影向量,以发现在线监测样本时空变化中的潜在偏差,从而保证监测性能的持续提高。STDA算法不是使用离线训练的固定投影框架,而是在新测量的样本可用于在线监测时重复执行。因此,本文提出的基于stda的方法能够始终如一地保证其在线故障检测的有效性,因为针对不同的在线监测样本,及时动态确定了一个旨在揭示时空变化偏差的投影框架。最后,通过与其他方法的比较,评估了所提出的基于stda的方法所取得的显著监控性能。
{"title":"Spatial–Temporal Deviation Analysis for Multivariate Statistical Process Monitoring","authors":"Meng Wang,&nbsp;Chudong Tong,&nbsp;Feng Xu,&nbsp;Lijia Luo","doi":"10.1002/cem.3611","DOIUrl":"https://doi.org/10.1002/cem.3611","url":null,"abstract":"<div>\u0000 \u0000 <p>Given that an effective process monitoring implementation should take both the spatial and temporal variations into account, a novel online process monitoring scheme based on a newly formulated algorithm titled as spatial–temporal deviation analysis (STDA) is proposed. Different from the mainstream process monitoring methods that focus on characterizing the spatial and/or temporal variation in the historical normal samples, the proposed STDA algorithm is designed to adaptively and timely train a pair of projecting vectors to uncover potential deviation in the spatial–temporal variation of online monitored samples, so as to guarantee consistently enhanced monitoring performance. Instead of utilizing a fixed projecting framework trained offline, the STDA algorithm is repeatedly executed once a newly measured sample become available for online monitoring. Therefore, the proposed STDA-based method could consistently ensure its effectiveness for online fault detection, because a projecting framework targeted to revealing deviation in spatial–temporal variation is dynamically determined for different online monitoring samples in a timely manner. Finally, the salient monitoring performance achieved by the proposed STDA-based approach is evaluated through comparisons with other counterparts.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142859960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Simulation Study of the Effects of Additive, Multiplicative, Correlated, and Uncorrelated Errors on Principal Component Analysis 主成分分析中加法误差、乘法误差、相关误差和非相关误差影响的模拟研究
IF 2.3 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-10-01 DOI: 10.1002/cem.3595
Edoardo Saccenti, Marieke E. Timmerman, José Camacho

Measurement errors are ubiquitous in all experimental sciences. Depending on the particular experimental platform used to acquire data, different types of errors are introduced, amounting to an admixture of additive and multiplicative error components that can be uncorrelated or correlated. In this paper, we investigate the effect of different types of experimental error on the recovery of the subspace with principal component analysis (PCA) using numerical simulations. Specifically, we assessed how different error characteristics (variance, correlation, and correlation structure), loading structures, and data distributions influence the accuracy to estimate an error-free (true) subspace from sampled data with PCA. Quality was assessed in terms of the mean squared reconstruction error and the congruence to the error-free loadings, using the pseudorank and adjusting for rotational ambiguity. Analysis of variance reveals that the error variance, error correlation structure, and their interaction with the loading structure are the factors mostly affecting quality of loading estimation from sampled data. We advocate for the need to characterize and assess the nature of measurement error and the need to adapt formulations of PCA that can explicitly take into account error structures in the model fitting.

测量误差在所有实验科学中都是普遍存在的。根据用于获取数据的特定实验平台,引入了不同类型的误差,相当于可加性和乘法误差成分的混合物,可以是不相关的,也可以是相关的。本文通过数值模拟研究了不同实验误差对主成分分析(PCA)恢复子空间的影响。具体来说,我们评估了不同的误差特征(方差、相关性和相关结构)、加载结构和数据分布如何影响PCA从采样数据中估计无误差(真实)子空间的准确性。使用伪秩和旋转模糊度调整,根据均方重构误差和与无误差负载的一致性来评估质量。方差分析表明,误差方差、误差相关结构及其与加载结构的相互作用是影响采样数据加载估计质量的主要因素。我们主张需要表征和评估测量误差的性质,需要适应PCA的公式,可以明确地考虑到模型拟合中的误差结构。
{"title":"A Simulation Study of the Effects of Additive, Multiplicative, Correlated, and Uncorrelated Errors on Principal Component Analysis","authors":"Edoardo Saccenti,&nbsp;Marieke E. Timmerman,&nbsp;José Camacho","doi":"10.1002/cem.3595","DOIUrl":"https://doi.org/10.1002/cem.3595","url":null,"abstract":"<p>Measurement errors are ubiquitous in all experimental sciences. Depending on the particular experimental platform used to acquire data, different types of errors are introduced, amounting to an admixture of additive and multiplicative error components that can be uncorrelated or correlated. In this paper, we investigate the effect of different types of experimental error on the recovery of the subspace with principal component analysis (PCA) using numerical simulations. Specifically, we assessed how different error characteristics (variance, correlation, and correlation structure), loading structures, and data distributions influence the accuracy to estimate an error-free (true) subspace from sampled data with PCA. Quality was assessed in terms of the mean squared reconstruction error and the congruence to the error-free loadings, using the pseudorank and adjusting for rotational ambiguity. Analysis of variance reveals that the error variance, error correlation structure, and their interaction with the loading structure are the factors mostly affecting quality of loading estimation from sampled data. We advocate for the need to characterize and assess the nature of measurement error and the need to adapt formulations of PCA that can explicitly take into account error structures in the model fitting.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3595","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142859945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Topics in Chemometrics TIC 2023 in Rostock, Germany 化学计量学主题TIC 2023在罗斯托克,德国
IF 2.3 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-09-30 DOI: 10.1002/cem.3612
Klaus Neymeyr, Mathias Sawall
{"title":"Topics in Chemometrics TIC 2023 in Rostock, Germany","authors":"Klaus Neymeyr,&nbsp;Mathias Sawall","doi":"10.1002/cem.3612","DOIUrl":"https://doi.org/10.1002/cem.3612","url":null,"abstract":"","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142862352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of Classical Least Squares Discriminant Analysis (CLS-DA) as a Novel Supervised Pattern Recognition Technique 经典最小二乘判别分析(CLS-DA)作为一种新的监督模式识别技术的评价
IF 2.3 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-09-28 DOI: 10.1002/cem.3609
Somaye Vali Zade, Hamid Abdollahi

Multivariate calibration techniques and machine learning algorithms are inextricably linked within the realm of chemometrics and data analysis. Classical least squares (CLS) modeling, a fundamental multivariate regression approach, has traditionally been utilized for quantitative analysis tasks, establishing relationships between predictor variables (e.g., spectroscopic data) and response variables (e.g., chemical concentrations). However, a unique feature of CLS is its ability to handle scenarios with partial knowledge of the independent variable matrix, making it an intriguing candidate for qualitative pattern recognition and discriminant analysis applications. This study proposes a novel approach, Classical Least Squares Discriminant Analysis (CLS-DA), which combines the principles of CLS modeling with discriminant analysis objectives. The performance of CLS-DA is comprehensively evaluated using two real-world datasets: chemical analysis of three wine cultivars and mid-infrared spectroscopy of minced meat samples (pork, chicken, and turkey). The results are compared against the well-established Partial Least Squares Discriminant Analysis (PLS-DA) method, a widely adopted technique for classification tasks in chemometrics. For both sets of experimental data, CLS-DA and PLS-DA showed comparable efficiency. For the classification of three types of wine, the accuracy of the proposed method was 94.3%, while the accuracy of the reference method was 98.1%. For the classification of minced meat samples, the accuracies of CLS-DA and PLS-DA were 97.2% and 94%, respectively for all three groups. The findings demonstrate the potential of CLS-DA as a straightforward and interpretable supervised pattern recognition technique, exhibiting comparable classification performance to PLS-DA. The study highlights the advantages of CLS-DA, including its ability to operate within the original data space and its flexibility in accommodating partial knowledge scenarios. The proposed CLS-DA approach presents a promising alternative for discriminant analysis, offering new perspectives on the applications of classical least squares modeling in chemometrics.

多元校准技术和机器学习算法在化学计量学和数据分析领域有着密不可分的联系。经典最小二乘(CLS)模型是一种基本的多元回归方法,传统上用于定量分析任务,建立预测变量(如光谱数据)和响应变量(如化学浓度)之间的关系。然而,CLS的一个独特之处在于它能够处理具有自变量矩阵部分知识的场景,使其成为定性模式识别和判别分析应用程序的有趣候选者。本文提出了一种新的方法——经典最小二乘判别分析(CLS- da),该方法将CLS建模原理与判别分析目标相结合。CLS-DA的性能使用两个真实世界的数据集进行综合评估:三个葡萄酒品种的化学分析和肉末样品(猪肉,鸡肉和火鸡)的中红外光谱。结果与公认的偏最小二乘判别分析(PLS-DA)方法进行了比较,PLS-DA是化学计量学中广泛采用的分类任务技术。在两组实验数据中,CLS-DA和PLS-DA的效率相当。对于三种葡萄酒的分类,本文方法的准确率为94.3%,而参考方法的准确率为98.1%。对于肉糜样品的分类,CLS-DA和PLS-DA的准确率分别为97.2%和94%。研究结果表明,CLS-DA作为一种直接和可解释的监督模式识别技术的潜力,表现出与PLS-DA相当的分类性能。该研究强调了CLS-DA的优势,包括其在原始数据空间内操作的能力以及适应部分知识场景的灵活性。本文提出的CLS-DA方法为判别分析提供了一种有前途的替代方法,为经典最小二乘建模在化学计量学中的应用提供了新的视角。
{"title":"Evaluation of Classical Least Squares Discriminant Analysis (CLS-DA) as a Novel Supervised Pattern Recognition Technique","authors":"Somaye Vali Zade,&nbsp;Hamid Abdollahi","doi":"10.1002/cem.3609","DOIUrl":"https://doi.org/10.1002/cem.3609","url":null,"abstract":"<div>\u0000 \u0000 <p>Multivariate calibration techniques and machine learning algorithms are inextricably linked within the realm of chemometrics and data analysis. Classical least squares (CLS) modeling, a fundamental multivariate regression approach, has traditionally been utilized for quantitative analysis tasks, establishing relationships between predictor variables (e.g., spectroscopic data) and response variables (e.g., chemical concentrations). However, a unique feature of CLS is its ability to handle scenarios with partial knowledge of the independent variable matrix, making it an intriguing candidate for qualitative pattern recognition and discriminant analysis applications. This study proposes a novel approach, Classical Least Squares Discriminant Analysis (CLS-DA), which combines the principles of CLS modeling with discriminant analysis objectives. The performance of CLS-DA is comprehensively evaluated using two real-world datasets: chemical analysis of three wine cultivars and mid-infrared spectroscopy of minced meat samples (pork, chicken, and turkey). The results are compared against the well-established Partial Least Squares Discriminant Analysis (PLS-DA) method, a widely adopted technique for classification tasks in chemometrics. For both sets of experimental data, CLS-DA and PLS-DA showed comparable efficiency. For the classification of three types of wine, the accuracy of the proposed method was 94.3%, while the accuracy of the reference method was 98.1%. For the classification of minced meat samples, the accuracies of CLS-DA and PLS-DA were 97.2% and 94%, respectively for all three groups. The findings demonstrate the potential of CLS-DA as a straightforward and interpretable supervised pattern recognition technique, exhibiting comparable classification performance to PLS-DA. The study highlights the advantages of CLS-DA, including its ability to operate within the original data space and its flexibility in accommodating partial knowledge scenarios. The proposed CLS-DA approach presents a promising alternative for discriminant analysis, offering new perspectives on the applications of classical least squares modeling in chemometrics.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142862344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Near-Infrared Spectroscopy and Aquaphotomics in Cancer Research: A Pilot Study 癌症研究中的近红外光谱和水生生物组学:试点研究
IF 2.3 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-09-24 DOI: 10.1002/cem.3600
Anastasiia Surkova, Ekaterina Boichenko, Olga Bibikova, Viacheslav Artyushenko, Jelena Muncan, Roumiana Tsenkova

Currently, the majority of methods to monitor cancer treatment through the analysis of body fluids are based on a highly selective detection of single molecules or cells. In this study, we are considering the analysis of the aqueous medium of liquid samples, that is, water, itself, using aquaphotomics and near-infrared spectroscopy (NIR) for spectral data acquisition and processing, within cancer research. Water, as a molecular system, is a rich source of information about the current state of a patient, which can be extracted from near-infrared spectra of liquid samples via simple algorithms based on multivariate data analysis. The reported results, obtained ex vivo of body fluids, demonstrate the potential of aquaphotomics in cancer research.

目前,通过体液分析来监测癌症治疗的大多数方法都是基于对单个分子或细胞的高度选择性检测。在这项研究中,我们正在考虑分析液体样品的水介质,即水本身,使用水光组学和近红外光谱(NIR)进行光谱数据采集和处理,在癌症研究中。水作为一种分子系统,是了解患者当前状态的丰富信息来源,通过基于多变量数据分析的简单算法,可以从液体样品的近红外光谱中提取出这些信息。报道的结果,体外获得的体液,证明了水光组学在癌症研究中的潜力。
{"title":"Near-Infrared Spectroscopy and Aquaphotomics in Cancer Research: A Pilot Study","authors":"Anastasiia Surkova,&nbsp;Ekaterina Boichenko,&nbsp;Olga Bibikova,&nbsp;Viacheslav Artyushenko,&nbsp;Jelena Muncan,&nbsp;Roumiana Tsenkova","doi":"10.1002/cem.3600","DOIUrl":"https://doi.org/10.1002/cem.3600","url":null,"abstract":"<div>\u0000 \u0000 <p>Currently, the majority of methods to monitor cancer treatment through the analysis of body fluids are based on a highly selective detection of single molecules or cells. In this study, we are considering the analysis of the aqueous medium of liquid samples, that is, water, itself, using aquaphotomics and near-infrared spectroscopy (NIR) for spectral data acquisition and processing, within cancer research. Water, as a molecular system, is a rich source of information about the current state of a patient, which can be extracted from near-infrared spectra of liquid samples via simple algorithms based on multivariate data analysis. The reported results, obtained ex vivo of body fluids, demonstrate the potential of aquaphotomics in cancer research.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142862257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Resampling as a Robust Measure of Model Complexity in PARAFAC Models 将重采样作为 PARAFAC 模型复杂性的稳健衡量标准
IF 2.3 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-09-04 DOI: 10.1002/cem.3601
Helene Fog Froriep Halberg, Marta Bevilacqua, Åsmund Rinnan

Fluorescence spectroscopy has been applied for analysis of complex samples, such as food and beverages. Parallel factor analysis (PARAFAC) is a well-known decomposition method for fluorescence excitation–emission matrices (EEMs). When the complexity of the system increases, it becomes considerably more difficult to determine the optimal number of PARAFAC components, especially when the fluorophores of the system are unknown. The two commonly applied diagnostics, core consistency and split-half analysis, appear to underestimate the model complexity due to covarying components and local minima, respectively. As a more robust alternative, we propose a resampling approach with multiple initializations and submodel comparisons for estimating the optimal number of PARAFAC components in complex data.

荧光光谱法已被用于分析食品和饮料等复杂样品。平行因子分析(PARAFAC)是一种著名的荧光激发-发射矩阵(EEM)分解方法。当系统的复杂性增加时,确定 PARAFAC 成分的最佳数量就变得相当困难,尤其是当系统中的荧光团未知时。两种常用的诊断方法--核心一致性和分割半分析--似乎分别由于共变成分和局部最小值而低估了模型的复杂性。作为一种更稳健的替代方法,我们提出了一种具有多重初始化和子模型比较的重采样方法,用于估计复杂数据中 PARAFAC 成分的最佳数量。
{"title":"Resampling as a Robust Measure of Model Complexity in PARAFAC Models","authors":"Helene Fog Froriep Halberg,&nbsp;Marta Bevilacqua,&nbsp;Åsmund Rinnan","doi":"10.1002/cem.3601","DOIUrl":"10.1002/cem.3601","url":null,"abstract":"<p>Fluorescence spectroscopy has been applied for analysis of complex samples, such as food and beverages. Parallel factor analysis (PARAFAC) is a well-known decomposition method for fluorescence excitation–emission matrices (EEMs). When the complexity of the system increases, it becomes considerably more difficult to determine the optimal number of PARAFAC components, especially when the fluorophores of the system are unknown. The two commonly applied diagnostics, core consistency and split-half analysis, appear to underestimate the model complexity due to covarying components and local minima, respectively. As a more robust alternative, we propose a resampling approach with multiple initializations and submodel comparisons for estimating the optimal number of PARAFAC components in complex data.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3601","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Non-Linear Model for Multiple Alcohol Intakes and Optimal Designs Strategies 多种酒精摄入量的非线性模型和最佳设计策略
IF 2.3 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-08-27 DOI: 10.1002/cem.3599
Irene Mariñas-Collado, Juan M. Rodríguez-Díaz, M. Teresa Santos-Martín

This study addresses the complex dynamics of alcohol elimination in the human body, very important in forensic and healthcare areas. Existing models often oversimplify with the assumption of linear elimination kinetics, limiting practical application. This study presents a novel non-linear model for estimating blood alcohol concentration after multiple intakes. Initially developed for two different alcohol incorporations, it can be straightforwardly extended to the case of more intakes. Emphasising the significance of accurate parameter estimation, the research underscores the importance of precise experimental design, utilising optimal experimental design (OED) methodologies. Sensitivity analysis of model coefficients and the determination of D-optimal designs, considering correlation structures among observations, reveal a strong linear relationship between support points. This relationship can be used to obtain nearly optimal designs that are highly efficient and much easier to compute.

这项研究探讨了酒精在人体内消除的复杂动态,这在法医和医疗保健领域非常重要。现有的模型往往过于简化,假定其为线性消除动力学,从而限制了实际应用。本研究提出了一种新的非线性模型,用于估计多次摄入后血液中的酒精浓度。该模型最初是针对两种不同的酒精摄入量而开发的,可以直接扩展到更多摄入量的情况。研究强调了精确参数估计的重要性,并强调了利用最优实验设计(OED)方法进行精确实验设计的重要性。对模型系数的敏感性分析和 D-最优设计的确定,考虑到了观测数据之间的相关结构,揭示了支持点之间强烈的线性关系。利用这种关系可以获得近乎最优的设计,这种设计效率高,而且更容易计算。
{"title":"A Non-Linear Model for Multiple Alcohol Intakes and Optimal Designs Strategies","authors":"Irene Mariñas-Collado,&nbsp;Juan M. Rodríguez-Díaz,&nbsp;M. Teresa Santos-Martín","doi":"10.1002/cem.3599","DOIUrl":"10.1002/cem.3599","url":null,"abstract":"<p>This study addresses the complex dynamics of alcohol elimination in the human body, very important in forensic and healthcare areas. Existing models often oversimplify with the assumption of linear elimination kinetics, limiting practical application. This study presents a novel non-linear model for estimating blood alcohol concentration after multiple intakes. Initially developed for two different alcohol incorporations, it can be straightforwardly extended to the case of more intakes. Emphasising the significance of accurate parameter estimation, the research underscores the importance of precise experimental design, utilising optimal experimental design (OED) methodologies. Sensitivity analysis of model coefficients and the determination of D-optimal designs, considering correlation structures among observations, reveal a strong linear relationship between support points. This relationship can be used to obtain nearly optimal designs that are highly efficient and much easier to compute.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3599","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Population Power Curves in ASCA With Permutation Testing 带有置换测试的 ASCA 人口功率曲线
IF 2.3 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-08-26 DOI: 10.1002/cem.3596
José Camacho, Michael Sorochan Armstrong

In this paper, we revisit the power curves in ANOVA simultaneous component analysis (ASCA) based on permutation testing and introduce the population curves derived from population parameters describing the relative effect among factors and interactions. The relative effect has important practical implications: The statistical power of a given factor depends on the design of other factors in the experiment and not only of the sample size. Thus, understanding the relative power in a specific experimental design can be extremely useful to maximize our capability of success when planning the experiment. In the paper, we derive relative and absolute population curves, where the former represent statistical power in terms of the normalized effect size between structure and noise, and the latter in terms of the sample size. Both types of population curves allow us to make decisions regarding the number and nature (fixed/random) of factors, their relationships (crossed/nested), and the number of levels and replicates, among others, in an multivariate experimental design (e.g., an omics study) during the planning phase of the experiment. We illustrate both types of curves through simulation.

在本文中,我们重新审视了基于置换检验的方差分析同时成分分析(ASCA)中的功率曲线,并引入了由描述因子间和交互作用间相对效应的群体参数导出的群体曲线。相对效应具有重要的实际意义:给定因素的统计能力取决于实验中其他因素的设计,而不仅仅是样本量。因此,了解特定实验设计中的相对效应对于我们在规划实验时最大限度地提高成功率非常有用。在本文中,我们推导了相对和绝对群体曲线,前者以结构和噪声之间的归一化效应大小表示统计能力,后者以样本量表示统计能力。这两类种群曲线都能让我们在实验计划阶段,就多元实验设计(如 omics 研究)中因子的数量和性质(固定/随机)、它们之间的关系(交叉/嵌套)、水平和重复的数量等做出决策。我们通过模拟来说明这两种类型的曲线。
{"title":"Population Power Curves in ASCA With Permutation Testing","authors":"José Camacho,&nbsp;Michael Sorochan Armstrong","doi":"10.1002/cem.3596","DOIUrl":"10.1002/cem.3596","url":null,"abstract":"<p>In this paper, we revisit the power curves in ANOVA simultaneous component analysis (ASCA) based on permutation testing and introduce the population curves derived from population parameters describing the relative effect among factors and interactions. The relative effect has important practical implications: The statistical power of a given factor depends on the design of other factors in the experiment and not only of the sample size. Thus, understanding the relative power in a specific experimental design can be extremely useful to maximize our capability of success when planning the experiment. In the paper, we derive relative and absolute population curves, where the former represent statistical power in terms of the normalized effect size between structure and noise, and the latter in terms of the sample size. Both types of population curves allow us to make decisions regarding the number and nature (fixed/random) of factors, their relationships (crossed/nested), and the number of levels and replicates, among others, in an multivariate experimental design (e.g., an omics study) during the planning phase of the experiment. We illustrate both types of curves through simulation.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3596","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chemometric Classification of Motor Oils Using 1H NMR Spectroscopy With Simultaneous Phase and Baseline Optimization 利用 1H NMR 光谱对机油进行化学计量分类,同时进行相位和基线优化
IF 2.3 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-08-25 DOI: 10.1002/cem.3598
A. Olejniczak, J. P. Łukaszewicz

Here, we demonstrate mid-field 1H NMR spectroscopy combined with chemometrics to be powerful in the classification and authentication of motor oils (MOs). The 1H NMR data were processed with a new algorithm for simultaneous phase and baseline correction, which, for crowded spectra such as those of the refinery products, allowed for more accurate estimation of phase parameters than other literature approaches tested. A principal component analysis (PCA) model based on the unbinned CH3 fingerprint region (0.6–1.0 ppm) enabled the differentiation of hydrocracked and poly-α-olefin-based MOs and was effective in resolving mixtures of these base stocks with conventional base oils. PCA analysis of the 1.0- to 1.14-ppm region enabled the detection of poly (isobutylene) additive and was useful for differentiating between single-grade and multigrade MOs. Non-equidistantly binned 1H NMR data were used to detect the addition of esters and to establish discriminant models for classifying MOs by viscosity grade and by major categories of synthetic, semisynthetic, and mineral oils. The performances of four classifiers (linear discriminant analysis [LDA], quadratic discriminant analysis [QDA], naïve Bayes classifier [NBC], and support vector machine [SVM]) with and without PCA dimensionality reduction were compared. In both tasks, SVM showed the best efficiency, with average error rates of ~2.3% and 8.15% for predicting major MO categories and viscosity grades, respectively. The potential to merge spectra collected from different NMR instruments is discussed for models based on spectral binning. It is also shown that small errors in phase parameters are not detrimental to binning-based PCA models.

在此,我们展示了中场 1H NMR 光谱与化学计量学相结合在机油 (MO) 分类和鉴定方面的强大功能。1H NMR 数据采用一种新算法进行处理,该算法可同时进行相位和基线校正,对于炼油厂产品等拥挤的光谱,该算法能比测试过的其他文献方法更准确地估计相位参数。基于未分馏 CH3 指纹区域(0.6-1.0 ppm)的主成分分析 (PCA) 模型能够区分加氢裂化 MO 和基于聚-α-烯烃的 MO,并能有效分辨这些基础油与传统基础油的混合物。通过对 1.0 至 1.14ppm 区域进行 PCA 分析,可以检测到聚(异丁烯)添加剂,并有助于区分单级和多级 MO。非流体分级 1H NMR 数据用于检测酯类的添加情况,并建立了按粘度等级以及合成油、半合成油和矿物油的主要类别对 MO 进行分类的判别模型。比较了四种分类器(线性判别分析器 [LDA]、二次判别分析器 [QDA]、奈夫贝叶斯分类器 [NBC] 和支持向量机 [SVM])在使用和未使用 PCA 降维的情况下的性能。在这两项任务中,SVM 的效率最高,预测主要 MO 类别和粘度等级的平均错误率分别为 ~2.3% 和 8.15%。对于基于光谱分选的模型,讨论了合并从不同 NMR 仪器收集的光谱的可能性。研究还表明,相位参数的微小误差不会对基于分选的 PCA 模型造成损害。
{"title":"Chemometric Classification of Motor Oils Using 1H NMR Spectroscopy With Simultaneous Phase and Baseline Optimization","authors":"A. Olejniczak,&nbsp;J. P. Łukaszewicz","doi":"10.1002/cem.3598","DOIUrl":"10.1002/cem.3598","url":null,"abstract":"<div>\u0000 \u0000 <p>Here, we demonstrate mid-field <sup>1</sup>H NMR spectroscopy combined with chemometrics to be powerful in the classification and authentication of motor oils (MOs). The <sup>1</sup>H NMR data were processed with a new algorithm for simultaneous phase and baseline correction, which, for crowded spectra such as those of the refinery products, allowed for more accurate estimation of phase parameters than other literature approaches tested. A principal component analysis (PCA) model based on the unbinned CH<sub>3</sub> fingerprint region (0.6–1.0 ppm) enabled the differentiation of hydrocracked and poly-α-olefin-based MOs and was effective in resolving mixtures of these base stocks with conventional base oils. PCA analysis of the 1.0- to 1.14-ppm region enabled the detection of poly (isobutylene) additive and was useful for differentiating between single-grade and multigrade MOs. Non-equidistantly binned <sup>1</sup>H NMR data were used to detect the addition of esters and to establish discriminant models for classifying MOs by viscosity grade and by major categories of synthetic, semisynthetic, and mineral oils. The performances of four classifiers (linear discriminant analysis [LDA], quadratic discriminant analysis [QDA], naïve Bayes classifier [NBC], and support vector machine [SVM]) with and without PCA dimensionality reduction were compared. In both tasks, SVM showed the best efficiency, with average error rates of ~2.3% and 8.15% for predicting major MO categories and viscosity grades, respectively. The potential to merge spectra collected from different NMR instruments is discussed for models based on spectral binning. It is also shown that small errors in phase parameters are not detrimental to binning-based PCA models.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Chemometrics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1