首页 > 最新文献

Journal of Chemometrics最新文献

英文 中文
Multiview Ensemble Learning Framework for Real-Time UV Spectroscopic Detection of Nitrate in Water With Chemometric Modelling 基于化学计量模型的水中硝酸盐紫外光谱实时检测的多视图集成学习框架
IF 2.1 4区 化学 Q1 SOCIAL WORK Pub Date : 2025-05-01 DOI: 10.1002/cem.70033
Sagar Rana, Sudeshna Bagchi

The accuracy of detection of nitrate in water for quality monitoring is a significant yet challenging task. To address this, the present work proposes an ensemble machine learning–based chemometric framework for the optical detection of nitrate in water. It incorporates an absorbance-based reagent-less detection of nitrate in water to support the robustness of the model. The absorption spectra were recorded using a portable set-up in the presence and absence of interfering ions. Different interfering ions, namely, nitrite (NO2), calcium (Ca2+), magnesium (Mg2+), carbonate (CO32−), bromide (Br), chloride (Cl) and phosphate (PO43−), in all possible combinations (binary, ternary, quaternary, quinary, senary and septenary mixtures) are added to target analyte to validate the real-time application of the proposed algorithm. Under the multiview framework, two models, MVNPM-I and MVNPM-II, i.e., multiview nitrate prediction models, are proposed. MVNPM-I is based on an ensemble of regressors' results, and MVNPM-II uses multiple views of the dataset followed by an ensemble of their results. The performance of the models is assessed using a hold-out validation scheme with 10 repetitions and measured using R2 score and mean squared error (MSE). The best results of R2 score 0.9978 with a standard deviation 0.0014 and MSE of 1.1799 with a standard deviation of 0.8639 are obtained using the MVNPM-II model. Further, the performance measures of the proposed models show that they can handle the presence of interfering ions. The algorithm was also tested using real-world samples with an R2 score and MSE of 0.9998 and 0.696, respectively. The promising results strengthen the applicability of the proposed method in real-world scenarios.

水质监测中硝酸盐的准确检测是一项重要而又具有挑战性的任务。为了解决这个问题,本研究提出了一个基于集成机器学习的化学计量学框架,用于水中硝酸盐的光学检测。它结合了基于吸收剂的水中硝酸盐少试剂检测,以支持模型的鲁棒性。在存在和不存在干扰离子的情况下,用便携式装置记录了吸收光谱。不同的干扰离子,即亚硝酸盐(NO2−)、钙(Ca2+)、镁(Mg2+)、碳酸盐(CO32−)、溴化物(Br−)、氯化物(Cl−)和磷酸盐(PO43−),以所有可能的组合(二元、三元、四元、五元、四元和七元混合物)添加到目标分析物中,以验证所提出算法的实时应用。在多视角框架下,提出了MVNPM-I和MVNPM-II两个多视角硝酸盐预测模型。MVNPM-I基于回归者结果的集合,而MVNPM-II使用数据集的多个视图,然后是它们结果的集合。使用10次重复的保留验证方案评估模型的性能,并使用R2评分和均方误差(MSE)进行测量。采用MVNPM-II模型得到的最佳结果为R2评分0.9978,标准差0.0014;MSE为1.1799,标准差0.8639。此外,所提出的模型的性能测量表明,它们可以处理干扰离子的存在。该算法还使用实际样本进行了测试,R2得分和MSE分别为0.9998和0.696。这些有希望的结果增强了所提出方法在现实场景中的适用性。
{"title":"Multiview Ensemble Learning Framework for Real-Time UV Spectroscopic Detection of Nitrate in Water With Chemometric Modelling","authors":"Sagar Rana,&nbsp;Sudeshna Bagchi","doi":"10.1002/cem.70033","DOIUrl":"10.1002/cem.70033","url":null,"abstract":"<div>\u0000 \u0000 <p>The accuracy of detection of nitrate in water for quality monitoring is a significant yet challenging task. To address this, the present work proposes an ensemble machine learning–based chemometric framework for the optical detection of nitrate in water. It incorporates an absorbance-based reagent-less detection of nitrate in water to support the robustness of the model. The absorption spectra were recorded using a portable set-up in the presence and absence of interfering ions. Different interfering ions, namely, nitrite (NO<sub>2</sub><sup>−</sup>), calcium (Ca<sup>2+</sup>), magnesium (Mg<sup>2+</sup>), carbonate (CO<sub>3</sub><sup>2−</sup>), bromide (Br<sup>−</sup>), chloride (Cl<sup>−</sup>) and phosphate (PO<sub>4</sub><sup>3−</sup>), in all possible combinations (binary, ternary, quaternary, quinary, senary and septenary mixtures) are added to target analyte to validate the real-time application of the proposed algorithm. Under the multiview framework, two models, MVNPM-I and MVNPM-II, i.e., multiview nitrate prediction models, are proposed. MVNPM-I is based on an ensemble of regressors' results, and MVNPM-II uses multiple views of the dataset followed by an ensemble of their results. The performance of the models is assessed using a hold-out validation scheme with 10 repetitions and measured using <i>R</i><sup>2</sup> score and mean squared error (MSE). The best results of <i>R</i><sup>2</sup> score 0.9978 with a standard deviation 0.0014 and MSE of 1.1799 with a standard deviation of 0.8639 are obtained using the MVNPM-II model. Further, the performance measures of the proposed models show that they can handle the presence of interfering ions. The algorithm was also tested using real-world samples with an <i>R</i><sup>2</sup> score and MSE of 0.9998 and 0.696, respectively. The promising results strengthen the applicability of the proposed method in real-world scenarios.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 5","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143892885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantitative Structure–Activity Relationship Modeling Based on Improving Kernel Ridge Regression 基于改进核岭回归的构效关系定量建模
IF 2.1 4区 化学 Q1 SOCIAL WORK Pub Date : 2025-05-01 DOI: 10.1002/cem.70027
Shaimaa Waleed Mahmood, Ghalya Tawfeeq Basheer, Zakariya Yahya Algamal

The quantitative structure–activity relationship (QSAR) as an effective and promising model to better understands the relationship between chemical activity and chemical compounds is usually used in modeling chemical datasets. Kernel ridge regression (KRR) has attracted the interest of scholars recently because of its non-iterative methodology for problem solving. KRR is a highly regarded and practical machine learning approach that has successfully tackled classification and regression issues. So is a regression method that uses a nonlinear kernel function to define an inner product in a higher-dimensional transformed space. This allows for generalization performance based on regularization least squares solution. However, the performance of KRR is affected by the choices of the values of the hyper-parameters that define the type of kernel. This has a major processing cost, uses memory, and is also accompanied by poor accuracy performance when studying the prior methods of determining these hyper-parameter values. Thus, the main highlighted enhancement in this paper is the enhancement of the coati optimization algorithm by applying elite opposite-based learning to increase the density of population around the search space to optima for the proper selection of the best hyperparameters. Thus, it is necessary to verify and compare its work with the proposed improvement of KRR in increasing its performance, seven public chemical datasets were used. Based on several assessment criteria, the results show that the proposed improvement is superior to all the baseline methods regarding the classification performance.

定量构效关系(quantitative structure-activity relationship, QSAR)是一种有效的、有前景的模型,可以更好地理解化学活性与化合物之间的关系,通常用于化学数据集的建模。核脊回归以其求解问题的非迭代方法近年来引起了学者们的广泛关注。KRR是一种备受推崇的实用机器学习方法,已经成功地解决了分类和回归问题。用非线性核函数在高维变换空间中定义内积的回归方法也是如此。这允许基于正则化最小二乘解的泛化性能。然而,KRR的性能受到定义内核类型的超参数值的选择的影响。这种方法的处理成本高,占用内存,并且在研究先前确定这些超参数值的方法时,还伴随着较差的精度性能。因此,本文主要强调的增强是对coati优化算法的增强,通过应用基于精英的对偶学习来增加搜索空间周围的人口密度,以优化最佳超参数的正确选择。因此,有必要将其工作与提出的KRR改进方法进行验证和比较,以提高其性能,使用了7个公共化学数据集。基于多个评价标准,结果表明所提出的改进方法在分类性能方面优于所有基线方法。
{"title":"Quantitative Structure–Activity Relationship Modeling Based on Improving Kernel Ridge Regression","authors":"Shaimaa Waleed Mahmood,&nbsp;Ghalya Tawfeeq Basheer,&nbsp;Zakariya Yahya Algamal","doi":"10.1002/cem.70027","DOIUrl":"10.1002/cem.70027","url":null,"abstract":"<div>\u0000 \u0000 <p>The quantitative structure–activity relationship (QSAR) as an effective and promising model to better understands the relationship between chemical activity and chemical compounds is usually used in modeling chemical datasets. Kernel ridge regression (KRR) has attracted the interest of scholars recently because of its non-iterative methodology for problem solving. KRR is a highly regarded and practical machine learning approach that has successfully tackled classification and regression issues. So is a regression method that uses a nonlinear kernel function to define an inner product in a higher-dimensional transformed space. This allows for generalization performance based on regularization least squares solution. However, the performance of KRR is affected by the choices of the values of the hyper-parameters that define the type of kernel. This has a major processing cost, uses memory, and is also accompanied by poor accuracy performance when studying the prior methods of determining these hyper-parameter values. Thus, the main highlighted enhancement in this paper is the enhancement of the coati optimization algorithm by applying elite opposite-based learning to increase the density of population around the search space to optima for the proper selection of the best hyperparameters. Thus, it is necessary to verify and compare its work with the proposed improvement of KRR in increasing its performance, seven public chemical datasets were used. Based on several assessment criteria, the results show that the proposed improvement is superior to all the baseline methods regarding the classification performance.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 5","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143892887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to “Fast Partition-Based Cross-Validation With Centering and Scaling for XTX and XTY” 修正“XTX和XTY的快速分区交叉验证与定心和缩放”
IF 2.1 4区 化学 Q1 SOCIAL WORK Pub Date : 2025-04-28 DOI: 10.1002/cem.70034

Galbo Engstrøm, O.-C. and Holm Jensen, M. (2025), Fast Partition-Based Cross-Validation With Centering and Scaling for XTX and XTY. Journal of Chemometrics, 39: e70008, https://doi.org/10.1002/cem.70008.

On line 27 in Algorithm 7 on page 10, the text to the right reads “Obtain XcsTYcsT” but should read “Obtain XcsTYcs”.

In Proposition 15 on page 11, the last equality contains a double hat over xsT. It should have been a single hat.

On pages 3 and 4, P$$ mathcal{P} $$ has been written multiple times when P[n]$$ mathcal{P}left[nright] $$ was intended. Likewise, V$$ mathcal{V} $$ has been written multiple times when V[p]$$ mathcal{V}left[pright] $$ was intended.

We apologize for the confusion.

Galbo Engstrøm, o . c。和Holm Jensen, M.(2025),基于快速分割的XTX和XTY的定心和缩放交叉验证。化学计量学学报,39:e70008, https://doi.org/10.1002/cem.70008。在第10页算法7的第27行,右侧的文本读为“获取XcsTYcsT”,但应该读为“获取XcsTYcs”。在第11页的命题15中,最后一个等式包含了xsT上的双帽。应该是一顶帽子。我们为造成的混乱道歉。
{"title":"Correction to “Fast Partition-Based Cross-Validation With Centering and Scaling for XTX and XTY”","authors":"","doi":"10.1002/cem.70034","DOIUrl":"10.1002/cem.70034","url":null,"abstract":"<p>\u0000 <span>Galbo Engstrøm, O.-C.</span> and <span>Holm Jensen, M.</span> (<span>2025</span>), <span>Fast Partition-Based Cross-Validation With Centering and Scaling for <b>X</b><sup><b>T</b></sup><b>X</b> and <b>X</b><sup><b>T</b></sup><b>Y</b></span>. <i>Journal of Chemometrics</i>, <span>39</span>: e70008, https://doi.org/10.1002/cem.70008.\u0000 </p><p>On line 27 in Algorithm 7 on page 10, the text to the right reads “Obtain <b>X</b><sup><b>csT</b></sup><b>Y</b><sup><b>csT</b></sup>” but should read “Obtain <b>X</b><sup><b>csT</b></sup><b>Y</b><sup><b>cs</b></sup>”.</p><p>In Proposition 15 on page 11, the last equality contains a double hat over <b>x</b><sub><b>s</b></sub><sup><b>T</b></sup>. It should have been a single hat.</p><p>On pages 3 and 4, <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>P</mi>\u0000 </mrow>\u0000 <annotation>$$ mathcal{P} $$</annotation>\u0000 </semantics></math> has been written multiple times when <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>P</mi>\u0000 <mo>[</mo>\u0000 <mo>n</mo>\u0000 <mo>]</mo>\u0000 </mrow>\u0000 <annotation>$$ mathcal{P}left[nright] $$</annotation>\u0000 </semantics></math> was intended. Likewise, <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>V</mi>\u0000 </mrow>\u0000 <annotation>$$ mathcal{V} $$</annotation>\u0000 </semantics></math> has been written multiple times when <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>V</mi>\u0000 <mo>[</mo>\u0000 <mo>p</mo>\u0000 <mo>]</mo>\u0000 </mrow>\u0000 <annotation>$$ mathcal{V}left[pright] $$</annotation>\u0000 </semantics></math> was intended.</p><p>We apologize for the confusion.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 5","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.70034","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143884225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HiBBKA: A Hybrid Method With Resampling and Heuristic Feature Selection for Class-Imbalanced Data in Chemometrics 化学计量学中类不平衡数据的重采样和启发式特征选择混合方法
IF 2.1 4区 化学 Q1 SOCIAL WORK Pub Date : 2025-04-20 DOI: 10.1002/cem.70029
Ying Guo, Ying Kou, Lun-Zhao Yi, Guang-Hui Fu

In critical domains including medicinal chemistry, biomedicine, metabolomics, and computational toxicology, class imbalance in datasets and poor recognition accuracy for minority classes remain persistent challenges. While previous studies have employed resampling and feature selection techniques to address data imbalance and enhance classification performance, most approaches have focused on single-algorithm solutions rather than hybrid methodologies. Hybrid algorithms offer distinct advantages by integrating the strengths of multiple techniques, thereby providing more comprehensive and efficient solutions for handling imbalanced data. This study proposes HiBBKA, a novel hybrid algorithm combining radial-based under-sampling with SMOTE (RBU-SMOTE) and an improved binary black-winged kite algorithm (iBBKA) for feature selection. The proposed framework operates through two key phases: First, the RBU-SMOTE resampling method synergistically integrates radial-based under-sampling (RBU) with the synthetic minority oversampling technique (SMOTE), effectively addressing class-imbalance distribution while enhancing the quality of synthesized samples. Second, the enhanced iBBKA feature selection algorithm systematically identifies the most discriminative features critical for classification tasks. We comprehensively evaluate RBU-SMOTE and HiBBKA using multiple classifiers across 16 imbalanced datasets, including real-world medical datasets, with particular emphasis on the minority class performance. Experimental results demonstrate that RBU-SMOTE achieves competitive performance compared to existing resampling methods, while the complete HiBBKA framework significantly outperforms state-of-the-art algorithms in overall classification metrics, particularly in the minority class recognition.

在包括药物化学、生物医学、代谢组学和计算毒理学在内的关键领域,数据集的类别不平衡和对少数类别的识别准确性差仍然是持续存在的挑战。虽然以前的研究使用重采样和特征选择技术来解决数据不平衡和提高分类性能,但大多数方法都集中在单一算法解决方案上,而不是混合方法。混合算法通过综合多种技术的优势,为处理不平衡数据提供更全面、更高效的解决方案,具有明显的优势。本研究提出了一种将径向欠采样与SMOTE算法(RBU-SMOTE)和改进的二进制黑翼风筝算法(iBBKA)相结合的特征选择混合算法HiBBKA。该框架通过两个关键阶段进行:首先,RBU-SMOTE重采样方法将基于径向的欠采样(RBU)与合成少数派过采样技术(SMOTE)协同集成,有效地解决了类不平衡分布问题,同时提高了合成样本的质量。其次,改进的iBBKA特征选择算法系统地识别出对分类任务最具判别性的特征。我们使用多个分类器在16个不平衡数据集(包括现实世界的医疗数据集)中全面评估RBU-SMOTE和HiBBKA,特别强调少数类别的表现。实验结果表明,与现有的重采样方法相比,RBU-SMOTE取得了具有竞争力的性能,而完整的HiBBKA框架在总体分类指标上明显优于最先进的算法,特别是在少数类识别方面。
{"title":"HiBBKA: A Hybrid Method With Resampling and Heuristic Feature Selection for Class-Imbalanced Data in Chemometrics","authors":"Ying Guo,&nbsp;Ying Kou,&nbsp;Lun-Zhao Yi,&nbsp;Guang-Hui Fu","doi":"10.1002/cem.70029","DOIUrl":"10.1002/cem.70029","url":null,"abstract":"<div>\u0000 \u0000 <p>In critical domains including medicinal chemistry, biomedicine, metabolomics, and computational toxicology, class imbalance in datasets and poor recognition accuracy for minority classes remain persistent challenges. While previous studies have employed resampling and feature selection techniques to address data imbalance and enhance classification performance, most approaches have focused on single-algorithm solutions rather than hybrid methodologies. Hybrid algorithms offer distinct advantages by integrating the strengths of multiple techniques, thereby providing more comprehensive and efficient solutions for handling imbalanced data. This study proposes HiBBKA, a novel hybrid algorithm combining radial-based under-sampling with SMOTE (RBU-SMOTE) and an improved binary black-winged kite algorithm (iBBKA) for feature selection. The proposed framework operates through two key phases: First, the RBU-SMOTE resampling method synergistically integrates radial-based under-sampling (RBU) with the synthetic minority oversampling technique (SMOTE), effectively addressing class-imbalance distribution while enhancing the quality of synthesized samples. Second, the enhanced iBBKA feature selection algorithm systematically identifies the most discriminative features critical for classification tasks. We comprehensively evaluate RBU-SMOTE and HiBBKA using multiple classifiers across 16 imbalanced datasets, including real-world medical datasets, with particular emphasis on the minority class performance. Experimental results demonstrate that RBU-SMOTE achieves competitive performance compared to existing resampling methods, while the complete HiBBKA framework significantly outperforms state-of-the-art algorithms in overall classification metrics, particularly in the minority class recognition.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 5","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143852899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Geographical Influence on Metabolite Profiles of Cupressus torulosa: UPLC-QTOF-MS (Positive Mode) and Chemometric Insights 地理对柏树代谢物谱的影响:UPLC-QTOF-MS(正模式)和化学计量学研究
IF 2.1 4区 化学 Q1 SOCIAL WORK Pub Date : 2025-04-14 DOI: 10.1002/cem.70031
Radhika Khanna, Khushaboo Bhadoriya, Gaurav Pandey, V. K. Varshney

C. torulosa, known as the Himalayan or Bhutan cypress, is a significant evergreen conifer that typically reaches heights between 20 and 45 m. This species is primarily found in the Himalayan regions of Bhutan, northern India, Nepal, and Tibet. In this study, we utilized ultra-performance liquid chromatography coupled with quadrupole time-of-flight mass spectrometry (UPLC-QTOF-MS) in positive ion mode, along with chemometric analysis, to investigate the metabolomic profiles of C. torulosa needles collected from 14 geographically distinct areas in Uttarakhand and Himachal Pradesh. Various statistical techniques, including ANOVA, Principal Component Analysis (PCA), Hierarchical Cluster Analysis (HCA), violin plots, scatter plots, box-and-whisker plots, and heatmaps, were employed to illustrate the relative quantitative differences among compounds based on their peak intensities across these regions. Our investigation revealed 34 marker compounds consistently detected across all samples (locations). These compounds were screened using rigorous filtering criteria, incorporating a moderated t-test and multiple testing adjustments using the Benjamini–Hochberg false discovery rate (FDR) approach. Furthermore, we pioneered the identification of the phenylpropanoid and flavonoid biosynthesis pathways in C. torulosa, providing new insights into its metabolic profile. This work establishes a foundational reference for future research into the species metabolome, helping guide studies in areas like genetic diversity, ecological adaptations, and climate resilience in C. torulosa. Mapping these pathways deepens scientific knowledge of C. torulosa's metabolic processes, contributing to a clearer understanding of its unique biochemical makeup.

C. torulosa,被称为喜马拉雅或不丹柏树,是一种重要的常绿针叶树,通常可以达到20到45米的高度。该物种主要分布在不丹、印度北部、尼泊尔和西藏的喜马拉雅地区。在这项研究中,我们利用超高效液相色谱-四极杆飞行时间质谱(UPLC-QTOF-MS)在正离子模式下,结合化学计量学分析,研究了在北阿坎德邦和喜马偕尔邦14个地理不同地区采集的C. torulosa针的代谢组学特征。利用方差分析(ANOVA)、主成分分析(PCA)、层次聚类分析(HCA)、小提琴图、散点图、盒须图和热图等统计技术,分析了这些地区化合物峰强度的相对定量差异。我们的调查揭示了34种标记化合物在所有样品(地点)中一致检测到。这些化合物使用严格的过滤标准进行筛选,包括适度t检验和使用benjamin - hochberg错误发现率(FDR)方法的多重测试调整。此外,我们率先鉴定了C. torulosa中苯丙素和类黄酮的生物合成途径,为其代谢谱提供了新的见解。本研究为今后的物种代谢组研究奠定了基础,有助于在遗传多样性、生态适应和气候适应等方面指导研究。绘制这些途径加深了对C. torulosa代谢过程的科学认识,有助于更清楚地了解其独特的生化组成。
{"title":"Geographical Influence on Metabolite Profiles of Cupressus torulosa: UPLC-QTOF-MS (Positive Mode) and Chemometric Insights","authors":"Radhika Khanna,&nbsp;Khushaboo Bhadoriya,&nbsp;Gaurav Pandey,&nbsp;V. K. Varshney","doi":"10.1002/cem.70031","DOIUrl":"10.1002/cem.70031","url":null,"abstract":"<div>\u0000 \u0000 <p><i>C. torulosa</i>, known as the Himalayan or Bhutan cypress, is a significant evergreen conifer that typically reaches heights between 20 and 45 m. This species is primarily found in the Himalayan regions of Bhutan, northern India, Nepal, and Tibet. In this study, we utilized ultra-performance liquid chromatography coupled with quadrupole time-of-flight mass spectrometry (UPLC-QTOF-MS) in positive ion mode, along with chemometric analysis, to investigate the metabolomic profiles of <i>C. torulosa</i> needles collected from 14 geographically distinct areas in Uttarakhand and Himachal Pradesh. Various statistical techniques, including ANOVA, Principal Component Analysis (PCA), Hierarchical Cluster Analysis (HCA), violin plots, scatter plots, box-and-whisker plots, and heatmaps, were employed to illustrate the relative quantitative differences among compounds based on their peak intensities across these regions. Our investigation revealed 34 marker compounds consistently detected across all samples (locations). These compounds were screened using rigorous filtering criteria, incorporating a moderated <i>t</i>-test and multiple testing adjustments using the Benjamini–Hochberg false discovery rate (FDR) approach. Furthermore, we pioneered the identification of the phenylpropanoid and flavonoid biosynthesis pathways in <i>C. torulosa</i>, providing new insights into its metabolic profile. This work establishes a foundational reference for future research into the species metabolome, helping guide studies in areas like genetic diversity, ecological adaptations, and climate resilience in <i>C. torulosa</i>. Mapping these pathways deepens scientific knowledge of <i>C. torulosa</i>'s metabolic processes, contributing to a clearer understanding of its unique biochemical makeup.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 4","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143831301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comprehensive Anomaly Score Rank Based Unsupervised Sample Selection Method 基于综合异常评分秩的无监督样本选择方法
IF 2.1 4区 化学 Q1 SOCIAL WORK Pub Date : 2025-04-08 DOI: 10.1002/cem.70028
Zhongjiang He, Zhonghai He, Xiaofang Zhang

The process of selecting representative samples is crucial for establishing an accurate calibration model. To enhance the representativeness of the samples, a method for sample selection, utilizing the degree of anomaly as the evaluation criterion, is proposed. Initially, anomaly scores corresponding to various detection methods are obtained to ensure a comprehensive evaluation. These scores are then normalized by the confidence lower limit to establish a consistent scoring criterion. Subsequently, the weights of different detection methods are determined through eigenvector centrality analysis of a graph, where the methods serve as nodes and the similarity acts as weighted edges. Finally, the comprehensive anomaly scores are computed as the sum of weighted scores and are subsequently sorted. Representative samples are selected using a uniformly spaced sampling approach, with the spacing determined by a predefined and provided sample number. The efficacy of the method is validated across different sample sets.

选择代表性样本的过程对于建立准确的校准模型至关重要。为了提高样本的代表性,提出了一种以异常程度作为评价标准的样本选择方法。首先得到不同检测方法对应的异常分数,以保证综合评价。然后通过置信下限将这些分数归一化,以建立一致的评分标准。然后,通过图的特征向量中心性分析确定不同检测方法的权重,其中方法作为节点,相似度作为加权边。最后,将综合异常分数计算为加权分数之和,并进行排序。使用均匀间隔采样方法选择代表性样本,其间隔由预定义的和提供的样本数确定。通过不同的样本集验证了该方法的有效性。
{"title":"Comprehensive Anomaly Score Rank Based Unsupervised Sample Selection Method","authors":"Zhongjiang He,&nbsp;Zhonghai He,&nbsp;Xiaofang Zhang","doi":"10.1002/cem.70028","DOIUrl":"10.1002/cem.70028","url":null,"abstract":"<div>\u0000 \u0000 <p>The process of selecting representative samples is crucial for establishing an accurate calibration model. To enhance the representativeness of the samples, a method for sample selection, utilizing the degree of anomaly as the evaluation criterion, is proposed. Initially, anomaly scores corresponding to various detection methods are obtained to ensure a comprehensive evaluation. These scores are then normalized by the confidence lower limit to establish a consistent scoring criterion. Subsequently, the weights of different detection methods are determined through eigenvector centrality analysis of a graph, where the methods serve as nodes and the similarity acts as weighted edges. Finally, the comprehensive anomaly scores are computed as the sum of weighted scores and are subsequently sorted. Representative samples are selected using a uniformly spaced sampling approach, with the spacing determined by a predefined and provided sample number. The efficacy of the method is validated across different sample sets.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 4","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143793390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data Quality: Importance of the ‘before analysis’ domain (Theory of Sampling, TOS) 数据质量:“分析前”域的重要性(抽样理论,TOS)
IF 2.1 4区 化学 Q1 SOCIAL WORK Pub Date : 2025-04-06 DOI: 10.1002/cem.70025

Data analysts/chemometricians are part of a scientific collegium covering three distinct domains: i) sampling – ii) analysis – iii) data modelling, which are collectively influencing ‘data quality’. There is much more to data quality than analytical uncertainty. There are many situations where analysis is to be made of heterogeneous materials/batches/lots/flowing streams, which need to be sampled appropriately before analysis, following an often long and complex pathway ‘from-lot-to-aliquot’. In most cases, sampling and sub-sampling will dominate the total Measurement Uncertainty budget (MUtotal). Left-out MUsampling contributions may easily overwhelm the Total Analytical Error (TAE) uncertainty by factors 5, 10, 25 or higher as a function of the specific heterogeneity characteristics of the materials and systems targeted, and of the sampling procedure used (grab vs. composite sampling). Focus is here on the consequences of unwittingly ignoring the uncertainties originating in these domains, which e.g. will influence adversely on bilinear component directions (reducing model accuracy) as well as RMSE estimates reflecting precision (analyte concentration prediction, classification, time series prediction) and along the way will also clear up an evergreen mistake: contrary to many beliefs, ‘more data’ will not automatically reduce the magnitude of an unsatisfactory performance RMSE. It is shown how the Theory of Sampling (TOS) is the only guarantor of representative sampling in the critical ‘before analysis’ domain. This article introduces the essential minimum TOS competence which must be mastered by stakeholders from all three domains. The conceptual elements in the TOS system can be visualised as a graphic overview:

Kim H. Esbensen has been professor at three universities (National Geological Survey of Denmark and Greenland (2010–2015), Aalborg University, Denmark (2001–2010), Telemark Institute of Technology, Norway (1990–2000) and professeur associé, Université du Québec à Chicoutimi before switching to a quest as an independent consultant in 2015. He is a member of several scientific societies and has published widely across several scientific fields. He is the author of a widely used textbook in Multivariate Data Analysis (chemometrics), and in 2020 published: “Introduction to the Theory and Practice of Sampling”. He was chairman of the taskforce responsible for the world's first horizontal (matrix-independent) sampling standard DS 3077:2024 - Esbensen is the founding editor of: “Sampling Science and Technology (SST)” - https://www.sst-magazine.info/issues/ He can be reached at his homepage https://kheconsult.com/

数据分析师/化学计量学家是涵盖三个不同领域的科学学院的一部分:i)抽样- ii)分析- iii)数据建模,它们共同影响“数据质量”。数据质量不仅仅是分析上的不确定性。在许多情况下,分析是对异质材料/批次/批次/流动流进行的,需要在分析之前进行适当的采样,这通常是一个漫长而复杂的“从批次到等分”的途径。在大多数情况下,抽样和次抽样将主导整个测量不确定度预算(MUtotal)。遗漏的采样贡献可能很容易以5、10、25或更高的因子压倒总分析误差(TAE)的不确定性,这是针对材料和系统的特定异质性特征以及所使用的采样程序(抓取与复合采样)的函数。这里的重点是无意中忽略这些领域中产生的不确定性的后果,例如,这将对双线性分量方向(降低模型精度)以及反映精度的RMSE估计(分析物浓度预测,分类,时间序列预测)产生不利影响,并在此过程中也将清除一个常绿错误:与许多人的看法相反,“更多的数据”不会自动降低令人不满意的性能RMSE的大小。它显示了抽样理论(TOS)是如何在关键的“分析前”领域的代表性抽样的唯一保证。本文介绍了三个领域的利益相关者必须掌握的基本最低TOS能力。TOS系统中的概念元素可以可视化为图形概述:Kim H. Esbensen曾在三所大学(丹麦和格陵兰国家地质调查局(2010-2015),丹麦奥尔堡大学(2001-2010),挪威Telemark理工学院(1990-2000)担任教授,并在2015年作为独立顾问转换为quest之前,曾在quicoutimi大学担任副教授。他是几个科学学会的成员,并在几个科学领域发表了广泛的文章。他是一本被广泛使用的多元数据分析(化学计量学)教科书的作者,并于2020年出版了《抽样理论与实践概论》。他是负责世界上第一个横向(矩阵无关)采样标准DS 3077:2024的工作组主席- Esbensen是:“采样科学与技术(SST)”的创始编辑- https://www.sst-magazine.info/issues/他可以在他的主页https://kheconsult.com/上找到
{"title":"Data Quality: Importance of the ‘before analysis’ domain (Theory of Sampling, TOS)","authors":"","doi":"10.1002/cem.70025","DOIUrl":"10.1002/cem.70025","url":null,"abstract":"<p>Data analysts/chemometricians are part of a scientific collegium covering three distinct domains: i) sampling – ii) analysis – iii) data modelling, which are collectively influencing ‘data quality’. There is much more to data quality than analytical uncertainty. There are many situations where <i>analysis</i> is to be made of heterogeneous materials/batches/lots/flowing streams, which need to be <i>sampled</i> appropriately before analysis, following an often long and complex pathway ‘from-lot-to-aliquot’. In most cases, sampling and sub-sampling will <i>dominate</i> the total Measurement Uncertainty budget (MU<sub>total</sub>). Left-out MU<sub>sampling</sub> contributions may easily overwhelm the Total Analytical Error (TAE) uncertainty by factors 5, 10, 25 or <i>higher</i> as a function of the specific heterogeneity characteristics of the materials and systems targeted, and of the sampling procedure used (grab vs. composite sampling). Focus is here on the consequences of unwittingly ignoring the uncertainties originating in these domains, which e.g. will influence adversely on bilinear component directions (reducing model <i>accuracy</i>) as well as RMSE estimates reflecting <i>precision</i> (analyte concentration prediction, classification, time series prediction) and along the way will also clear up an evergreen mistake: contrary to many beliefs, ‘more data’ will <span>not</span> automatically reduce the magnitude of an unsatisfactory performance RMSE. It is shown how the Theory of Sampling (TOS) is the only guarantor of representative sampling in the critical ‘before analysis’ domain. This article introduces the essential minimum TOS competence which must be mastered by stakeholders from all three domains. The conceptual elements in the TOS <i>system</i> can be visualised as a graphic overview:</p><p>Kim H. Esbensen has been professor at three universities (National Geological Survey of Denmark and Greenland (2010–2015), Aalborg University, Denmark (2001–2010), Telemark Institute of Technology, Norway (1990–2000) and professeur associé, Université du Québec à Chicoutimi before switching to a quest as an independent consultant in 2015. He is a member of several scientific societies and has published widely across several scientific fields. He is the author of a widely used textbook in Multivariate Data Analysis (chemometrics), and in 2020 published: “Introduction to the Theory and Practice of Sampling”. He was chairman of the taskforce responsible for the world's first horizontal (matrix-independent) sampling standard DS 3077:2024 - Esbensen is the founding editor of: “Sampling Science and Technology (SST)” - https://www.sst-magazine.info/issues/ He can be reached at his homepage https://kheconsult.com/</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 4","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.70025","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143787231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data Quality: Importance of the ‘Before Analysis’ Domain [Theory of Sampling (TOS)] 数据质量:“前分析”域的重要性[抽样理论(TOS)]
IF 2.1 4区 化学 Q1 SOCIAL WORK Pub Date : 2025-04-06 DOI: 10.1002/cem.70021
Kim H. Esbensen

Data Quality: what is it, where does it originate, how does it influence data modelling, what can chemometricians do about it? The ‘before analysis’ domain is prone to sampling errors resulting in uncertainties influencing the quality of both analysis and data analysis/data modelling. Nonrepresentative sampling of heterogeneous materials, batches, lots and process streams ‘before analysis’ contribute significantly to the total measurement uncertainty, MUtotal = MUsampling + MUanalysis. The total sampling error (TSE) can dominate over the total analytical error (TAE) by factors ranging 5, 10 or higher, depending on the degree of material heterogeneity encountered and the specific sampling procedure employed to produce the final analytical aliquot, which is the only material actually analysed. The analytical aliquot is the physical manifestation of transgressing the boundary from the before analysis (sampling) domain to the domain of analysis. It is only possible to guarantee representativity of the analytical aliquot, and thus of the analytical results with respect to the original target batch/lot/process stream, by invoking the necessary sampling domain competence stipulated by theory of sampling (TOS). Primary sampling is the most important stage in the full lot-to-analysis pathway, quantitatively dominating MUtotal (but subsequent subsampling stages can also be significant). If the sources of adverse sampling error effects have not been eliminated, the sampling process is biased and MUtotal will be unnecessarily inflated. TOS offers ways and means to deal actively with a potential sampling bias (which is fundamentally different from the analytical bias). Overlooking, or deliberately ignoring dealing appropriately with sampling effects constitutes a lack of due diligence, which has critical bearings on the QC/QA demands on both analysis and data analysis/modelling. This article presents all uncertainty contributions in the lot-to-analysis-to-data modelling pathway, which must be identified and managed, eliminated or maximally reduced, to be able to document a fully minimised MUtotal. Data analysts/chemometricians are part of a scientific collegium covering all three domains: sampling—analysis—data modelling, which are collectively responsible for ‘data quality’. This comprehensive scope has serious implications for the current PAT paradigm, the foundation of which turns out to need significant reform regarding a key process sampling aspect regardless of whether physical samples, or PAT sensor technology spectra, are extracted/acquired. This article introduces the essential minimum TOS competence that must be mastered by stakeholders from all three domains.

数据质量:它是什么,它起源于哪里,它如何影响数据建模,化学计量学家对此能做些什么?“分析前”域容易出现抽样误差,导致不确定性影响分析和数据分析/数据建模的质量。非代表性取样的异质性材料,批次,批次和工艺流“分析前”显著贡献总测量不确定度,MUtotal = MUsampling + MUanalysis。总抽样误差(TSE)可以在总分析误差(TAE)上占主导地位,其因子范围为5、10或更高,这取决于所遇到的材料异质性程度和用于产生最终分析同物的特定抽样程序,这是实际分析的唯一材料。解析等值线是从分析前(采样)域向分析域跨越边界的物理表现。只有通过调用抽样理论(TOS)规定的必要采样域权限,才能保证分析同质物的代表性,从而保证分析结果相对于原始目标批/批/工艺流的代表性。初级抽样是整个从批量到分析途径中最重要的阶段,在数量上主导着MUtotal(但随后的次抽样阶段也可能很重要)。如果不利的抽样误差影响的来源没有消除,抽样过程是有偏差的,MUtotal将不必要地膨胀。TOS提供了积极处理潜在抽样偏差的方法和手段(这与分析偏差根本不同)。忽视或故意忽略适当处理抽样效应构成缺乏尽职调查,这对分析和数据分析/建模的QC/QA要求具有关键影响。本文介绍了从批量到分析到数据建模途径中的所有不确定性贡献,必须识别和管理,消除或最大限度地减少,以便能够记录完全最小化的MUtotal。数据分析师/化学计量学家是涵盖所有三个领域的科学学院的一部分:抽样-分析-数据建模,它们共同负责“数据质量”。这种全面的范围对当前的PAT范式具有严重的影响,无论提取/获取物理样本还是PAT传感器技术光谱,其基础都需要对关键过程采样方面进行重大改革。本文介绍了所有三个领域的涉众必须掌握的基本最低TOS能力。
{"title":"Data Quality: Importance of the ‘Before Analysis’ Domain [Theory of Sampling (TOS)]","authors":"Kim H. Esbensen","doi":"10.1002/cem.70021","DOIUrl":"10.1002/cem.70021","url":null,"abstract":"<p>Data Quality: what is it, where does it originate, how does it influence data modelling, what can chemometricians do about it? The ‘before analysis’ domain is prone to sampling errors resulting in uncertainties influencing the quality of both analysis and data analysis/data modelling. Nonrepresentative sampling of heterogeneous materials, batches, lots and process streams ‘before analysis’ contribute significantly to the total measurement uncertainty, MU<sub>total</sub> = MU<sub>sampling</sub> + MU<sub>analysis</sub>. The total sampling error (TSE) can dominate over the total analytical error (TAE) by factors ranging 5, 10 or <i>higher</i>, depending on the <i>degree</i> of material heterogeneity encountered and the specific sampling procedure employed to produce the final analytical aliquot, which is the only material actually analysed. The analytical aliquot is the physical manifestation of transgressing the boundary <span>from</span> the before analysis (sampling) domain <span>to</span> the domain of analysis. It is only possible to guarantee representativity of the analytical aliquot, and thus of the analytical results with respect to the original target batch/lot/process stream, by invoking the necessary sampling domain competence stipulated by theory of sampling (TOS). Primary sampling is the most important stage in the full lot-to-analysis pathway, quantitatively dominating MU<sub>total</sub> (but subsequent subsampling stages can also be significant). If the sources of adverse sampling error effects have not been eliminated, the sampling process is <i>biased</i> and MU<sub>total</sub> will be unnecessarily inflated. TOS offers ways and means to deal actively with a potential sampling bias (which is fundamentally different from the analytical bias). Overlooking, or deliberately ignoring dealing appropriately with sampling effects constitutes a lack of due diligence, which has critical bearings on the QC/QA demands on both analysis and data analysis/modelling. This article presents all uncertainty contributions in the lot-to-analysis-to-data modelling pathway, which must be identified and managed, eliminated or maximally reduced, to be able to document a fully minimised MU<sub>total</sub>. Data analysts/chemometricians are part of a scientific collegium covering all three domains: sampling—analysis—data modelling, which are collectively responsible for ‘data quality’. This comprehensive scope has serious implications for the current PAT paradigm, the foundation of which turns out to need significant reform regarding a key process sampling aspect regardless of whether physical samples, or PAT sensor technology spectra, are extracted/acquired. This article introduces the essential minimum TOS competence that must be mastered by stakeholders from all three domains.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 4","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.70021","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143787233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Expandable Diffusion Map–Based Weighted k-Nearest Neighbor Technique for Multimode Batch Process Monitoring 基于可扩展扩散图的加权 k 近邻技术用于多模式批量流程监控
IF 2.1 4区 化学 Q1 SOCIAL WORK Pub Date : 2025-04-05 DOI: 10.1002/cem.70020
Liwei Feng, Yifei Wu, Shaofeng Guo, Yu Xing, Yuan Li

The diffusion map–based k-nearest neighbor (DM-kNN) rule faces two challenges in multimode batch process monitoring. Firstly, the DM method encounters difficulties in projecting new samples. The training samples are repeatedly feature extracted, resulting in a time-consuming process. Faulty samples may be merged into normal samples and modeled together, which does not meet the requirements for fault detection. Secondly, DM-kNN has poor monitoring performance for multimode processes with significant variance differences. This paper proposes a technique called the expandable DM–based weighted k-nearest neighbor (EDM-WkNN) to solve these two issues. The expandable DM constructs a local projection matrix to attain the projecting of new samples. The effect of mode variance differences is eliminated by introducing weighted distances in statistic to overcome the difficulties caused by variance differences. We compare EDM-WkNN with classical fault detection methods through numerical examples and the fed-batch fermentation penicillin (FBFP) process. Our experiments confirm that the EDM-WkNN method effectively monitors faults in multimode batch processes.

基于扩散映射的k近邻(DM-kNN)规则在多模式批处理过程监控中面临两个挑战。首先,DM方法在投影新样本时遇到困难。训练样本的特征提取是重复的,耗时长。故障样本可能被合并到正常样本中并一起建模,这不能满足故障检测的要求。其次,DM-kNN对方差差异显著的多模过程监测性能较差。为了解决这两个问题,本文提出了一种基于可扩展dm的加权k近邻算法(EDM-WkNN)。可扩展DM构造一个局部投影矩阵来实现新样本的投影。通过在统计中引入加权距离,消除了模态方差差异的影响,克服了方差差异带来的困难。通过数值算例和分批补料发酵青霉素(FBFP)过程比较了EDM-WkNN与经典故障检测方法。实验结果表明,EDM-WkNN方法可以有效地监测多模批处理过程中的故障。
{"title":"Expandable Diffusion Map–Based Weighted k-Nearest Neighbor Technique for Multimode Batch Process Monitoring","authors":"Liwei Feng,&nbsp;Yifei Wu,&nbsp;Shaofeng Guo,&nbsp;Yu Xing,&nbsp;Yuan Li","doi":"10.1002/cem.70020","DOIUrl":"10.1002/cem.70020","url":null,"abstract":"<div>\u0000 \u0000 <p>The diffusion map–based <i>k</i>-nearest neighbor (DM-kNN) rule faces two challenges in multimode batch process monitoring. Firstly, the DM method encounters difficulties in projecting new samples. The training samples are repeatedly feature extracted, resulting in a time-consuming process. Faulty samples may be merged into normal samples and modeled together, which does not meet the requirements for fault detection. Secondly, DM-kNN has poor monitoring performance for multimode processes with significant variance differences. This paper proposes a technique called the expandable DM–based weighted <i>k</i>-nearest neighbor (EDM-WkNN) to solve these two issues. The expandable DM constructs a local projection matrix to attain the projecting of new samples. The effect of mode variance differences is eliminated by introducing weighted distances in statistic to overcome the difficulties caused by variance differences. We compare EDM-WkNN with classical fault detection methods through numerical examples and the fed-batch fermentation penicillin (FBFP) process. Our experiments confirm that the EDM-WkNN method effectively monitors faults in multimode batch processes.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 4","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143778291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Smart Monitoring Solutions for Real-Time Water pH Regulation in Aquatic Ecotoxicology 水生生态毒理学中实时水pH调节的智能监测解决方案
IF 2.1 4区 化学 Q1 SOCIAL WORK Pub Date : 2025-04-03 DOI: 10.1002/cem.70024
Usman Ibrahim, Nasir Abbas, Muhammad Riaz, Tahir Mahmood

This study designs a statistical process control tool that effectively detects small and moderate shifts in process parameters, to address challenges in quality monitoring. The proposed control chart employs advanced statistical detection techniques to enhance sensitivity while reducing false alarms, thus improving detection performance in various applications. This methodology is applied in a real-life context within an aquatic ecotoxicology laboratory, where daily monitoring of water pH levels is essential for safeguarding the health of sensitive aquatic organisms, such as mysids. The laboratory environment is meticulously controlled to simulate natural conditions, and our application of the proposed control chart ensures that any deviations from the optimal pH level are detected promptly, thereby maintaining water quality and supporting the reliability of experimental outcomes. The paper comprehensively evaluates the performance of the proposed control chart in both zero-state and steady-state conditions, offering valuable insights for practitioners in the field. We present empirical evidence demonstrating that the proposed control chart significantly outperforms traditional control charts, including Shewhart, CUSUM, and EWMA, particularly in detecting small to moderate shifts in water pH levels. Furthermore, we provide optimal parameter settings tailored for specific monitoring scenarios, enhancing the applicability of proposed control chart for quality control in laboratory environments.

本研究设计了一个统计过程控制工具,可以有效地检测过程参数的微小和中度变化,以解决质量监控中的挑战。本文提出的控制图采用先进的统计检测技术,在提高灵敏度的同时减少误报,从而提高了各种应用中的检测性能。该方法在水生生态毒理学实验室的现实环境中得到应用,在该实验室中,每天监测水的pH值对于保护敏感的水生生物(如蚜虫)的健康至关重要。我们对实验室环境进行了细致的控制,以模拟自然条件,我们所提出的控制图的应用确保及时检测到任何偏离最佳pH值的情况,从而保持水质并支持实验结果的可靠性。本文全面评估了所提出的控制图在零状态和稳态条件下的性能,为该领域的从业者提供了有价值的见解。我们提出的经验证据表明,所提出的控制图明显优于传统的控制图,包括Shewhart、CUSUM和EWMA,特别是在检测水pH值的小到中等变化方面。此外,我们提供了针对特定监测场景的最佳参数设置,增强了所提出的控制图在实验室环境中质量控制的适用性。
{"title":"Smart Monitoring Solutions for Real-Time Water pH Regulation in Aquatic Ecotoxicology","authors":"Usman Ibrahim,&nbsp;Nasir Abbas,&nbsp;Muhammad Riaz,&nbsp;Tahir Mahmood","doi":"10.1002/cem.70024","DOIUrl":"10.1002/cem.70024","url":null,"abstract":"<div>\u0000 \u0000 <p>This study designs a statistical process control tool that effectively detects small and moderate shifts in process parameters, to address challenges in quality monitoring. The proposed control chart employs advanced statistical detection techniques to enhance sensitivity while reducing false alarms, thus improving detection performance in various applications. This methodology is applied in a real-life context within an aquatic ecotoxicology laboratory, where daily monitoring of water pH levels is essential for safeguarding the health of sensitive aquatic organisms, such as mysids. The laboratory environment is meticulously controlled to simulate natural conditions, and our application of the proposed control chart ensures that any deviations from the optimal pH level are detected promptly, thereby maintaining water quality and supporting the reliability of experimental outcomes. The paper comprehensively evaluates the performance of the proposed control chart in both zero-state and steady-state conditions, offering valuable insights for practitioners in the field. We present empirical evidence demonstrating that the proposed control chart significantly outperforms traditional control charts, including Shewhart, CUSUM, and EWMA, particularly in detecting small to moderate shifts in water pH levels. Furthermore, we provide optimal parameter settings tailored for specific monitoring scenarios, enhancing the applicability of proposed control chart for quality control in laboratory environments.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 4","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143770223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Chemometrics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1