首页 > 最新文献

Chemometrics and Intelligent Laboratory Systems最新文献

英文 中文
Integrated monitoring and diagnosis of industrial processes based on causality synergistic and unique decomposition 基于因果、协同和独特分解的工业过程综合监测与诊断
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-11-30 DOI: 10.1016/j.chemolab.2025.105593
Shijie Zhu , Qi Zhang , Shuai Li , Yang Fu , Dongni Jia , Yigeng Wang
Causality mining plays a crucial role in monitoring complex industrial processes. However, incomplete extraction of quality related information may lead to a reduced monitoring accuracy rate for quality related faults, while uncertain causal relationships during root variable mining can further result in wrong fault diagnosis outcomes. To address these problems, we decompose the causal relationships between variables into synergistic and unique ones and further propose an integrated monitoring and diagnosis approach for industrial processes based on causality synergistic and unique decomposition. Firstly, we use Granger causality to preliminarily identify quality-related features and enhance the extraction of quality related features via the synergistic effect of causal relationships for addressing their complex interdependence. Secondly, due to the synergistic causality among variables between variable groups, it is necessary to capture and model their dynamic characteristics to ensure monitoring accuracy. We extend quality variable fault monitoring to process variables and further achieve integrated monitoring. Finally, we explore causal uniqueness to identify the fault root cause, which is key to achieving precise and rapid diagnosis in complex and uncertain industrial processes. The feasibility and effectiveness of the proposed method were validated in two scenarios: the benchmark Tennessee Eastman (TE) chemical process and an industrial case study of poor iron ore beneficiation.
因果关系挖掘在监测复杂工业过程中起着至关重要的作用。然而,质量相关信息的不完全提取会导致质量相关故障的监测准确率降低,而根变量挖掘过程中因果关系的不确定性会进一步导致错误的故障诊断结果。为了解决这些问题,我们将变量之间的因果关系分解为协同和唯一的因果关系,并进一步提出了基于因果协同和唯一分解的工业过程综合监测与诊断方法。首先,我们利用格兰杰因果关系对质量相关特征进行初步识别,并通过因果关系的协同效应来增强质量相关特征的提取,以解决它们之间复杂的相互依存关系。其次,由于变量组之间的变量之间存在协同因果关系,因此有必要对其动态特性进行捕捉和建模,以保证监测的准确性。将质量变量故障监测扩展到过程变量,进一步实现一体化监测。最后,探讨故障的因果唯一性,识别故障的根本原因,这是在复杂不确定的工业过程中实现精确快速诊断的关键。在田纳西伊士曼(Tennessee Eastman)化工流程和贫铁矿选矿工业案例两种场景下验证了该方法的可行性和有效性。
{"title":"Integrated monitoring and diagnosis of industrial processes based on causality synergistic and unique decomposition","authors":"Shijie Zhu ,&nbsp;Qi Zhang ,&nbsp;Shuai Li ,&nbsp;Yang Fu ,&nbsp;Dongni Jia ,&nbsp;Yigeng Wang","doi":"10.1016/j.chemolab.2025.105593","DOIUrl":"10.1016/j.chemolab.2025.105593","url":null,"abstract":"<div><div>Causality mining plays a crucial role in monitoring complex industrial processes. However, incomplete extraction of quality related information may lead to a reduced monitoring accuracy rate for quality related faults, while uncertain causal relationships during root variable mining can further result in wrong fault diagnosis outcomes. To address these problems, we decompose the causal relationships between variables into synergistic and unique ones and further propose an integrated monitoring and diagnosis approach for industrial processes based on causality synergistic and unique decomposition. Firstly, we use Granger causality to preliminarily identify quality-related features and enhance the extraction of quality related features via the synergistic effect of causal relationships for addressing their complex interdependence. Secondly, due to the synergistic causality among variables between variable groups, it is necessary to capture and model their dynamic characteristics to ensure monitoring accuracy. We extend quality variable fault monitoring to process variables and further achieve integrated monitoring. Finally, we explore causal uniqueness to identify the fault root cause, which is key to achieving precise and rapid diagnosis in complex and uncertain industrial processes. The feasibility and effectiveness of the proposed method were validated in two scenarios: the benchmark Tennessee Eastman (TE) chemical process and an industrial case study of poor iron ore beneficiation.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105593"},"PeriodicalIF":3.8,"publicationDate":"2025-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145682909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online monitoring of phytochemical dynamics in black tea processing using MIP-driven classifier models 利用mip驱动分类器模型在线监测红茶加工过程中的植物化学动态
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-11-30 DOI: 10.1016/j.chemolab.2025.105611
Debanjana Ghosh , Debangana Das , Shreya Nag , Runu Banerjee Roy
—Black tea processing involves variation in phytochemical constituents through multiple stages, with the tea quality index varying according to these biomarkers. In this treatise, modified classifier models were used to monitor two key biomarkers, catechin and epigallocatechin gallate (EGCG), throughout the five distinct stages of black tea processing for tea quality estimation. Catechin and EGCG selective molecularly imprinted polymer (MIP) electrodes were prepared for differential pulse voltammetry (DPV) responses at five different processing stages of black tea. The DPV responses were analyzed for the discrimination of the processing stages based on the content of catechin and EGCG using a stacked model incorporating four classification algorithms—Random forest, K-nearest neighbors, Gaussian Naive Bayes (NB), and Gradient boosting and an Artificial Neural Network (ANN) classifier model. The proposed models exhibited satisfactory performance in classifying five different stages of fermentation for four different tea samples, with accuracies of 98 % for catechin and 95 % for EGCG. Principal Component Analysis (PCA) plots show the capability of the sensors to identify each stage of tea processing as a distinct cluster. The sensor response also exhibited a consistent pattern of change in catechin and EGCG contents across various stages of tea processing.
-红茶加工过程涉及多个阶段植物化学成分的变化,茶叶质量指数根据这些生物标志物而变化。在这篇论文中,改进的分类器模型用于监测两种关键的生物标志物,儿茶素和表没食子儿茶素没食子酸酯(EGCG),在红茶加工的五个不同阶段进行茶叶质量评估。制备了儿茶素和EGCG选择性分子印迹聚合物(MIP)电极,用于红茶5个不同加工阶段的差分脉冲伏安(DPV)响应。采用随机森林、k近邻、高斯朴素贝叶斯(NB)和梯度增强四种分类算法和人工神经网络(ANN)分类器模型,对DPV响应进行了基于儿茶素和EGCG含量的加工阶段判别。所提出的模型在四种不同茶叶样品的五个不同发酵阶段分类中表现出令人满意的性能,儿茶素和EGCG的准确率分别为98%和95%。主成分分析(PCA)图显示了传感器将茶叶加工的每个阶段识别为一个不同集群的能力。在茶叶加工的不同阶段,传感器的反应也显示出儿茶素和EGCG含量的一致变化模式。
{"title":"Online monitoring of phytochemical dynamics in black tea processing using MIP-driven classifier models","authors":"Debanjana Ghosh ,&nbsp;Debangana Das ,&nbsp;Shreya Nag ,&nbsp;Runu Banerjee Roy","doi":"10.1016/j.chemolab.2025.105611","DOIUrl":"10.1016/j.chemolab.2025.105611","url":null,"abstract":"<div><div>—Black tea processing involves variation in phytochemical constituents through multiple stages, with the tea quality index varying according to these biomarkers. In this treatise, modified classifier models were used to monitor two key biomarkers, catechin and epigallocatechin gallate (EGCG), throughout the five distinct stages of black tea processing for tea quality estimation. Catechin and EGCG selective molecularly imprinted polymer (MIP) electrodes were prepared for differential pulse voltammetry (DPV) responses at five different processing stages of black tea. The DPV responses were analyzed for the discrimination of the processing stages based on the content of catechin and EGCG using a stacked model incorporating four classification algorithms—Random forest, K-nearest neighbors, Gaussian Naive Bayes (NB), and Gradient boosting and an Artificial Neural Network (ANN) classifier model. The proposed models exhibited satisfactory performance in classifying five different stages of fermentation for four different tea samples, with accuracies of 98 % for catechin and 95 % for EGCG. Principal Component Analysis (PCA) plots show the capability of the sensors to identify each stage of tea processing as a distinct cluster. The sensor response also exhibited a consistent pattern of change in catechin and EGCG contents across various stages of tea processing.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105611"},"PeriodicalIF":3.8,"publicationDate":"2025-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145682775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved variable reduction in Partial Least Squares modelling by global-minimum error reproducible Uninformative-Variable Elimination 基于全局最小误差可重复无信息变量消去的偏最小二乘模型中改进的变量约简
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-11-29 DOI: 10.1016/j.chemolab.2025.105603
Jan P.M. Andries , Gerjen H. Tinnevelt , Yvan Vander Heyden
The well-known Uninformative-Variable Elimination for Partial Least Squares, denoted as UVE-PLS, is not reproducible regarding the selected variables. Additionally, in UVE, variables are selected in the first minimum of the graph of the root mean squared error of cross validation (RMSECV) against the number of retained variables. This results mostly in rather large numbers of selected variables. Therefore, there is a need for a new and reproducible UVE method with better selective and preferably also better predictive abilities. Consequently, the Global-Minimum Error Reproducible Uninformative-Variable Elimination method, denoted as GME-RUVE, is proposed and tested.
In the GME-RUVE method, main characteristics of two existing methods, i.e. Jack-knife-based Partial Least Squares Regression (JK-PLSR) and Global-Minimum Error Uninformative-Variable Elimination (GME-UVE), are combined. JK-PLSR can be considered as a reproducible version of the original UVE method.
In GME-RUVE, as in the JK-PLSR method, no artificial random variables are added to the X matrix, and firstly the significance of the PLS regression coefficients is determined from jack-knifing. Secondly, as in the GME-UVE method, either the global minimum or the critical RMSECV is used for the selection of the variables. The performance of the new GME-RUVE method is investigated using four datasets with multivariate profiles, i.e. either simulated profiles, NIR spectra or theoretical molecular descriptor profiles, resulting in 12 profile-response (X-y) combinations.
The predictive performance of GME-RUVE, using the global RMSECV minimum and both the selective and predictive performances of GME-RUVE, using the critical RMSECV, are significantly better than both those of the JK-PLSR method, using the first local RMSECV minimum, and of the existing UVE method. The selective and predictive performances of the new GME-RUVE method are also much better than those of the existing GME-UVE method. Moreover, variables selected by the above GME-RUVE method have a chemical meaning.
众所周知的偏最小二乘的非信息变量消除,表示为UVE-PLS,对于所选变量是不可重复的。此外,在UVE中,在交叉验证均方根误差(RMSECV)与保留变量数量的图的第一个最小值中选择变量。这主要导致大量的选定变量。因此,需要一种新的、可重复的、具有更好的选择性和更好的预测能力的UVE方法。在此基础上,提出了全局最小误差可重复无信息变量消除方法GME-RUVE,并对其进行了验证。在GME-RUVE方法中,结合了基于杰克刀的偏最小二乘回归(JK-PLSR)和全局最小误差无信息变量消除(GME-UVE)两种现有方法的主要特点。JK-PLSR可以被认为是原始UVE方法的可重复版本。在GME-RUVE中,与JK-PLSR方法一样,没有在X矩阵中添加人工随机变量,首先通过jack- knife方法确定PLS回归系数的显著性。其次,与GME-UVE方法一样,要么使用全局最小值,要么使用临界RMSECV来选择变量。利用模拟谱、近红外光谱或理论分子描述子谱等4个多变量谱数据集研究了新型GME-RUVE方法的性能,得到了12种谱-响应(X-y)组合。使用全局RMSECV最小值的GME-RUVE方法的预测性能以及使用临界RMSECV的GME-RUVE方法的选择性和预测性能均明显优于使用第一个局部RMSECV最小值的JK-PLSR方法和现有的UVE方法。新的GME-RUVE方法的选择性和预测性能也比现有的GME-UVE方法好得多。而且,上述GME-RUVE方法所选取的变量具有化学意义。
{"title":"Improved variable reduction in Partial Least Squares modelling by global-minimum error reproducible Uninformative-Variable Elimination","authors":"Jan P.M. Andries ,&nbsp;Gerjen H. Tinnevelt ,&nbsp;Yvan Vander Heyden","doi":"10.1016/j.chemolab.2025.105603","DOIUrl":"10.1016/j.chemolab.2025.105603","url":null,"abstract":"<div><div>The well-known Uninformative-Variable Elimination for Partial Least Squares, denoted as UVE-PLS, is not reproducible regarding the selected variables. Additionally, in UVE, variables are selected in the first minimum of the graph of the root mean squared error of cross validation (<em>RMSECV</em>) against the number of retained variables. This results mostly in rather large numbers of selected variables. Therefore, there is a need for a new and reproducible UVE method with better selective and preferably also better predictive abilities. Consequently, the Global-Minimum Error Reproducible Uninformative-Variable Elimination method, denoted as GME-RUVE, is proposed and tested.</div><div>In the GME-RUVE method, main characteristics of two existing methods, i.e. Jack-knife-based Partial Least Squares Regression (JK-PLSR) and Global-Minimum Error Uninformative-Variable Elimination (GME-UVE), are combined. JK-PLSR can be considered as a reproducible version of the original UVE method.</div><div>In GME-RUVE, as in the JK-PLSR method, no artificial random variables are added to the <strong><em>X</em></strong> matrix, and firstly the significance of the PLS regression coefficients is determined from jack-knifing. Secondly, as in the GME-UVE method, either the <em>global minimum</em> or the <em>critical RMSECV</em> is used for the selection of the variables. The performance of the new GME-RUVE method is investigated using four datasets with multivariate profiles, i.e. either simulated profiles, NIR spectra or theoretical molecular descriptor profiles, resulting in 12 profile-response (<strong><em>X</em></strong>-<strong><em>y</em></strong>) combinations.</div><div>The predictive performance of GME-RUVE, using the <em>global RMSECV minimum</em> and both the selective and predictive performances of GME-RUVE, using the <em>critical RMSECV</em>, are significantly better than both those of the JK-PLSR method, using the <em>first local RMSECV minimum</em>, and of the existing UVE method. The selective and predictive performances of the new GME-RUVE method are also much better than those of the existing GME-UVE method. Moreover, variables selected by the above GME-RUVE method have a chemical meaning.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105603"},"PeriodicalIF":3.8,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145682819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multi-scale approach integrating hyperspectral system for tracing the origin of agricultural products 一种集成高光谱系统的农产品原产地溯源多尺度方法
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-11-27 DOI: 10.1016/j.chemolab.2025.105592
Yuqi Ren , He Wang , Chongbo Yin , Hong Men , Yan Shi , Jingjing Liu
Agricultural products of the same variety can differ in quality, appearance, and nutritional value due to variations in climate, soil, and other growth conditions. To support reliable and sustainable origin traceability, we propose a non-destructive framework using hyperspectral data. Spectral information for rice and peanut samples from multiple production regions was acquired using a GaiaSorter hyperspectral imaging system. This method can rapidly detect chemical bonds and functional groups, with differences in these features reflecting the overall microstructural quality of agricultural products from different origins. A novel Quadrangle Attention with Deformation (QAD) module was designed to enhance multi-scale feature learning. The module applies geometric transformations within local windows and incorporates relative positional encoding to capture multi-scale receptive-field information, thereby improving spectral-band relationships. By embedding the QAD module into a separable-convolution backbone, we developed the Quadrangle Attention with Deformation Network (QAD-Net) for precise origin identification. On two benchmark datasets, QAD-Net achieved state-of-the-art accuracy, reaching 99.66 ± 0.57 % on peanuts and 99.57 ± 0.65 % on rice, outperforming existing models. This work demonstrates the potential of QAD-Net as a fast, accurate, and non-destructive tool for hyperspectral origin traceability, with significant implications for on-site quality supervision, authenticity verification, and sustainable market regulation.
由于气候、土壤和其他生长条件的不同,同一品种的农产品在质量、外观和营养价值上可能有所不同。为了支持可靠和可持续的原产地追溯,我们提出了一个使用高光谱数据的非破坏性框架。利用GaiaSorter高光谱成像系统获取了多个产区水稻和花生样品的光谱信息。该方法可以快速检测到化学键和官能团,这些特征的差异反映了不同产地农产品的整体微观结构质量。为了增强多尺度特征学习能力,设计了一种新的变形四边形注意(QAD)模块。该模块在局部窗口内应用几何变换,并结合相对位置编码来捕获多尺度接收场信息,从而改善频谱带关系。通过将QAD模块嵌入到可分离卷积主干中,我们开发了具有变形网络的Quadrangle Attention with Deformation Network (QAD- net),用于精确的原点识别。在两个基准数据集上,QAD-Net达到了最先进的准确率,花生和大米的准确率分别达到99.66±0.57%和99.57±0.65%,优于现有模型。这项工作证明了QAD-Net作为一种快速、准确、无损的高光谱来源溯源工具的潜力,对现场质量监督、真实性验证和可持续市场监管具有重要意义。
{"title":"A multi-scale approach integrating hyperspectral system for tracing the origin of agricultural products","authors":"Yuqi Ren ,&nbsp;He Wang ,&nbsp;Chongbo Yin ,&nbsp;Hong Men ,&nbsp;Yan Shi ,&nbsp;Jingjing Liu","doi":"10.1016/j.chemolab.2025.105592","DOIUrl":"10.1016/j.chemolab.2025.105592","url":null,"abstract":"<div><div>Agricultural products of the same variety can differ in quality, appearance, and nutritional value due to variations in climate, soil, and other growth conditions. To support reliable and sustainable origin traceability, we propose a non-destructive framework using hyperspectral data. Spectral information for rice and peanut samples from multiple production regions was acquired using a GaiaSorter hyperspectral imaging system. This method can rapidly detect chemical bonds and functional groups, with differences in these features reflecting the overall microstructural quality of agricultural products from different origins. A novel Quadrangle Attention with Deformation (QAD) module was designed to enhance multi-scale feature learning. The module applies geometric transformations within local windows and incorporates relative positional encoding to capture multi-scale receptive-field information, thereby improving spectral-band relationships. By embedding the QAD module into a separable-convolution backbone, we developed the Quadrangle Attention with Deformation Network (QAD-Net) for precise origin identification. On two benchmark datasets, QAD-Net achieved state-of-the-art accuracy, reaching 99.66 ± 0.57 % on peanuts and 99.57 ± 0.65 % on rice, outperforming existing models. This work demonstrates the potential of QAD-Net as a fast, accurate, and non-destructive tool for hyperspectral origin traceability, with significant implications for on-site quality supervision, authenticity verification, and sustainable market regulation.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105592"},"PeriodicalIF":3.8,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145617036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explainable machine learning enables robust evaluation of extracted ion chromatograms in LC–MS metabolomics 可解释的机器学习能够在LC-MS代谢组学中对提取的离子色谱进行稳健的评估
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-11-26 DOI: 10.1016/j.chemolab.2025.105591
Juehong Dai , Liheng Dong , Jingjing Xu , Lingli Deng , Lei Guo , Jiyang Dong
Reliable evaluation of extracted ion chromatograms (EICs) remains a persistent challenge in LC–MS metabolomics, as inaccuracies in peak identification can profoundly impact subsequent data analysis and interpretation. While recent deep learning approaches show promise, their computational burden, limited generalizability, and lack of interpretability hinder broad adoption in routine analytical workflows. To address these limitations, we introduce EXACT-EIC (EXplainable Assessment of Chromatogram qualiTy for EICs), a lightweight, explainable machine learning framework. EXACT-EIC employs a thoughtfully designed 34 handcrafted features to perform two critical tasks: effective binary classification of EICs (peak vs. noise) and quantitative quality scoring. Benchmarking on curated in-house and public testing set demonstrated that EXACT-EIC achieved 95.2 % accuracy and 98.1 % recall for classification. For quantitative assessment, it attained a mean absolute error of 0.70 on a 1–10 expert-assigned quality scale. These results consistently outperformed state-of-the-art deep learning methods including PeakOnly and QuanFormer. Furthermore, Shapley Additive exPlanations (SHAP) analysis quantified the contribution of key chromatographic features (e.g., apex-boundary ratio, distribution entropy) to model predictions, offering transparent mechanistic insights absent in "black-box" architectures. By combining robustness, interpretability, and computational efficiency, EXACT-EIC facilitates reliable EIC evaluation across diverse platforms and experimental conditions. It provides a practical, deployable solution for automated quality control and confident metabolite annotation, addressing a critical need in untargeted LC–MS metabolomics workflows.
在LC-MS代谢组学中,可靠地评估提取的离子色谱(EICs)仍然是一个持续的挑战,因为峰识别的不准确性会严重影响随后的数据分析和解释。虽然最近的深度学习方法显示出希望,但它们的计算负担、有限的通用性和缺乏可解释性阻碍了在常规分析工作流程中的广泛采用。为了解决这些限制,我们引入了EXACT-EIC (EICs的可解释色谱质量评估),这是一个轻量级的,可解释的机器学习框架。EXACT-EIC采用精心设计的34个手工功能来执行两个关键任务:有效的eic二进制分类(峰值与噪声)和定量质量评分。对策划的内部和公共测试集的基准测试表明,EXACT-EIC在分类方面达到了95.2%的准确率和98.1%的召回率。对于定量评估,它在1-10专家分配的质量量表上达到了0.70的平均绝对误差。这些结果始终优于最先进的深度学习方法,包括PeakOnly和QuanFormer。此外,Shapley加性解释(SHAP)分析量化了关键色谱特征(如顶点边界比、分布熵)对模型预测的贡献,提供了“黑箱”架构中缺乏的透明机制见解。通过结合鲁棒性、可解释性和计算效率,EXACT-EIC有助于在不同平台和实验条件下进行可靠的EIC评估。它提供了一个实用的、可部署的解决方案,用于自动化质量控制和自信的代谢物注释,解决了非靶向LC-MS代谢组学工作流程中的关键需求。
{"title":"Explainable machine learning enables robust evaluation of extracted ion chromatograms in LC–MS metabolomics","authors":"Juehong Dai ,&nbsp;Liheng Dong ,&nbsp;Jingjing Xu ,&nbsp;Lingli Deng ,&nbsp;Lei Guo ,&nbsp;Jiyang Dong","doi":"10.1016/j.chemolab.2025.105591","DOIUrl":"10.1016/j.chemolab.2025.105591","url":null,"abstract":"<div><div>Reliable evaluation of extracted ion chromatograms (EICs) remains a persistent challenge in LC–MS metabolomics, as inaccuracies in peak identification can profoundly impact subsequent data analysis and interpretation. While recent deep learning approaches show promise, their computational burden, limited generalizability, and lack of interpretability hinder broad adoption in routine analytical workflows. To address these limitations, we introduce EXACT-EIC (EXplainable Assessment of Chromatogram qualiTy for EICs), a lightweight, explainable machine learning framework. EXACT-EIC employs a thoughtfully designed 34 handcrafted features to perform two critical tasks: effective binary classification of EICs (peak vs. noise) and quantitative quality scoring. Benchmarking on curated in-house and public testing set demonstrated that EXACT-EIC achieved 95.2 % accuracy and 98.1 % recall for classification. For quantitative assessment, it attained a mean absolute error of 0.70 on a 1–10 expert-assigned quality scale. These results consistently outperformed state-of-the-art deep learning methods including PeakOnly and QuanFormer. Furthermore, Shapley Additive exPlanations (SHAP) analysis quantified the contribution of key chromatographic features (e.g., apex-boundary ratio, distribution entropy) to model predictions, offering transparent mechanistic insights absent in \"black-box\" architectures. By combining robustness, interpretability, and computational efficiency, EXACT-EIC facilitates reliable EIC evaluation across diverse platforms and experimental conditions. It provides a practical, deployable solution for automated quality control and confident metabolite annotation, addressing a critical need in untargeted LC–MS metabolomics workflows.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105591"},"PeriodicalIF":3.8,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145610607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A classification model for early detection of breast cancer by Raman spectroscopy based on categorical embedding transformer 基于分类嵌入变压器的乳腺癌早期检测拉曼光谱分类模型
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-11-24 DOI: 10.1016/j.chemolab.2025.105589
Chaoyuan Hou , Fei Xie , Guohua Wu , Wenting Yu , Houpu Yang , Liu Yang , Xuewen Long , Longfei Yin , Shu Wang
At present, Raman spectroscopy combined with deep learning has been widely used in the field of disease screening. Transformer is an important architecture for deep learning and has excelled in several areas with technologies such as its self-attention mechanism. However, as an architecture originally designed for the field of natural language processing, Transformer has disadvantages such as high computational complexity and easy overfitting in small data sets when processing spectral data. In this study, we propose a spectral classification model called Categorical Embedding Transformer (CET) and apply it to the screening of breast cancer and ductal carcinoma in situ combined with Raman spectroscopy. The core principle of CET model is to embed class labels to fixed dimensional vectors and update them as learnable parameters during training. The CET model also removes the positional encoding in transformer encoder and the initial linear layer used for dimensionality reduction or dimensionality enhancement, and retains the structure used for feature extraction and dimensionality reduction of spectral data. The ability of feature extraction and dimensionality reduction of spectral data is retained while the computational complexity is reduced. Finally, the dot product is used to calculate the similarity between the class vector and the spectrum after dimensionality reduction, and the cross entropy loss function is used to maximize the dot product similarity of the real class during training. The model we built achieved 100 % accuracy on the validation set and 98.2 % accuracy on the unknown test set, which is better than other compared models.
目前,拉曼光谱与深度学习相结合已广泛应用于疾病筛查领域。Transformer是一种重要的深度学习架构,在一些领域表现出色,比如它的自关注机制。然而,作为一种最初为自然语言处理领域设计的架构,Transformer在处理光谱数据时存在计算复杂度高、小数据集容易过拟合等缺点。在本研究中,我们提出了一种称为分类嵌入变压器(CET)的光谱分类模型,并结合拉曼光谱将其应用于乳腺癌和导管原位癌的筛查。CET模型的核心原理是将类标签嵌入到固定维度的向量中,并在训练过程中更新为可学习的参数。CET模型还去掉了变压器编码器中的位置编码和用于降维或增强的初始线性层,保留了用于光谱数据特征提取和降维的结构。在降低计算复杂度的同时,保留了光谱数据的特征提取和降维能力。最后,利用点积计算降维后的类向量与谱的相似度,并利用交叉熵损失函数在训练过程中最大化真实类的点积相似度。我们建立的模型在验证集上的准确率达到100%,在未知测试集上的准确率达到98.2%,优于其他比较模型。
{"title":"A classification model for early detection of breast cancer by Raman spectroscopy based on categorical embedding transformer","authors":"Chaoyuan Hou ,&nbsp;Fei Xie ,&nbsp;Guohua Wu ,&nbsp;Wenting Yu ,&nbsp;Houpu Yang ,&nbsp;Liu Yang ,&nbsp;Xuewen Long ,&nbsp;Longfei Yin ,&nbsp;Shu Wang","doi":"10.1016/j.chemolab.2025.105589","DOIUrl":"10.1016/j.chemolab.2025.105589","url":null,"abstract":"<div><div>At present, Raman spectroscopy combined with deep learning has been widely used in the field of disease screening. Transformer is an important architecture for deep learning and has excelled in several areas with technologies such as its self-attention mechanism. However, as an architecture originally designed for the field of natural language processing, Transformer has disadvantages such as high computational complexity and easy overfitting in small data sets when processing spectral data. In this study, we propose a spectral classification model called Categorical Embedding Transformer (CET) and apply it to the screening of breast cancer and ductal carcinoma in situ combined with Raman spectroscopy. The core principle of CET model is to embed class labels to fixed dimensional vectors and update them as learnable parameters during training. The CET model also removes the positional encoding in transformer encoder and the initial linear layer used for dimensionality reduction or dimensionality enhancement, and retains the structure used for feature extraction and dimensionality reduction of spectral data. The ability of feature extraction and dimensionality reduction of spectral data is retained while the computational complexity is reduced. Finally, the dot product is used to calculate the similarity between the class vector and the spectrum after dimensionality reduction, and the cross entropy loss function is used to maximize the dot product similarity of the real class during training. The model we built achieved 100 % accuracy on the validation set and 98.2 % accuracy on the unknown test set, which is better than other compared models.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"268 ","pages":"Article 105589"},"PeriodicalIF":3.8,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145614940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling and statistical analysis of cancer drugs using M-polynomial indices for their characteristics 肿瘤药物特性的m -多项式指标建模与统计分析
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-11-24 DOI: 10.1016/j.chemolab.2025.105590
Qasem M. Tawhari , Muhammad Naeem , Saba Maqbool , Syed Muhammad Kashif Raza , Adnan Aslam
This study computes M-polynomial indices for Doxorubicin and Mitoxantrone, two widely used anthracycline and anthracenedione anticancer drugs, respectively. Doxorubicin, a potent topoisomerase II inhibitor, is commonly employed in treating various cancers, including breast, ovarian, and leukemia. Mitoxantrone, with its unique DNA-intercalating properties, is effective against acute myeloid leukemia, breast cancer, and non-Hodgkin’s lymphoma. We produced M-polynomial indices by partitioning graph edges depending on degree and adjacency matrix. A Python algorithm is written using an adjacency matrix to efficiently compute the indices, reducing calculation time from days to minutes and eliminating human error. Simple linear regression models in SPSS software are used to create QSPR and predict the physical attributes of cancer medicines. Our findings show that M-polynomial indices accurately predict physical attributes, providing important insights into the structural requirements for maximum anticancer action. In addition, we proposed models for each physical attribute. This study aids in the development of new cancer therapies and the prediction of physical features for uncharacterized medications.
本研究分别计算了阿霉素和米托蒽醌这两种常用的蒽环类和蒽二酮类抗癌药的m -多项式指标。阿霉素是一种有效的拓扑异构酶II抑制剂,通常用于治疗各种癌症,包括乳腺癌、卵巢癌和白血病。米托蒽醌具有独特的dna嵌入特性,对急性髓性白血病、乳腺癌和非霍奇金淋巴瘤有效。我们根据度和邻接矩阵对图边进行划分,得到了m个多项式索引。Python算法使用邻接矩阵编写,以有效地计算索引,将计算时间从几天减少到几分钟,并消除了人为错误。使用SPSS软件中的简单线性回归模型创建QSPR并预测癌症药物的物理属性。我们的研究结果表明,m -多项式指数准确地预测了物理属性,为最大抗癌作用的结构要求提供了重要见解。此外,我们提出了每个物理属性的模型。这项研究有助于开发新的癌症治疗方法和预测未表征药物的物理特征。
{"title":"Modeling and statistical analysis of cancer drugs using M-polynomial indices for their characteristics","authors":"Qasem M. Tawhari ,&nbsp;Muhammad Naeem ,&nbsp;Saba Maqbool ,&nbsp;Syed Muhammad Kashif Raza ,&nbsp;Adnan Aslam","doi":"10.1016/j.chemolab.2025.105590","DOIUrl":"10.1016/j.chemolab.2025.105590","url":null,"abstract":"<div><div>This study computes M-polynomial indices for Doxorubicin and Mitoxantrone, two widely used anthracycline and anthracenedione anticancer drugs, respectively. Doxorubicin, a potent topoisomerase II inhibitor, is commonly employed in treating various cancers, including breast, ovarian, and leukemia. Mitoxantrone, with its unique DNA-intercalating properties, is effective against acute myeloid leukemia, breast cancer, and non-Hodgkin’s lymphoma. We produced M-polynomial indices by partitioning graph edges depending on degree and adjacency matrix. A Python algorithm is written using an adjacency matrix to efficiently compute the indices, reducing calculation time from days to minutes and eliminating human error. Simple linear regression models in SPSS software are used to create QSPR and predict the physical attributes of cancer medicines. Our findings show that M-polynomial indices accurately predict physical attributes, providing important insights into the structural requirements for maximum anticancer action. In addition, we proposed models for each physical attribute. This study aids in the development of new cancer therapies and the prediction of physical features for uncharacterized medications.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"268 ","pages":"Article 105590"},"PeriodicalIF":3.8,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145615443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recent developments in evolutionary computation for generative adversarial networks: A comprehensive survey 生成对抗网络进化计算的最新进展:综合综述
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-11-20 DOI: 10.1016/j.chemolab.2025.105587
Atifa Rafique , Xue Yu , Kashif Iqbal
In recent years, evolutionary generative adversarial networks (EGANs) have been proposed as a emerging research area that merges the well-known original concept of generative adversarial networks (GAN) for generating realistic data and evolutionary computation (EC) techniques to optimize solutions by inspiration from nature. In this review paper, we delve into the synergetic relationship between EC and GAN with an emphasis on EGANs — an emerging direction that has the potential to spark a multitude of practical applications. To this end, we first introduce the key concepts of GANs and EC respectively in detail to illustrate their synergism for modeling novel data efficiently while keeping consistency with reality. Then we describe how EC techniques have been incorporated into these architectures to improve both performance and diversity. This paper presents a thorough analysis of the EGANs in various domains. In this perspective, EGANs have been proven to be very effective in various real-world problems like data scarcity as well as mode collapse and training instability. We also consider the limitations of EGANs and suggest methods for addressing them. For the future, we present new research directions for EGANs and suggest that it could potentially transform artificial intelligence (AI) as well as push forward cutting-edge applications in personalized content generation, virtual reality (VR) experiences, and medical diagnosis. In conclusion, it will provide a solid foundation for EGANs. It represents a promising trajectory for AI space due to a combination of two powerful paradigms, GAN and EC. It aims to handle the challenges which will result in enabling the new world in data synthesis and optimization.
近年来,进化生成对抗网络(EGANs)作为一个新兴的研究领域被提出,它融合了著名的生成对抗网络(GAN)的原始概念,用于生成现实数据和进化计算(EC)技术,以从自然中获得灵感来优化解决方案。在这篇综述论文中,我们深入研究了EC和GAN之间的协同关系,重点是EGANs -一个新兴的方向,有可能引发大量的实际应用。为此,我们首先分别详细介绍了gan和EC的关键概念,以说明它们在有效建模新数据同时保持与现实一致方面的协同作用。然后,我们描述了如何将EC技术集成到这些体系结构中,以提高性能和多样性。本文对各个领域的EGANs进行了深入的分析。从这个角度来看,EGANs已经被证明在数据稀缺、模式崩溃和训练不稳定等各种现实问题中非常有效。我们还考虑了EGANs的局限性,并提出了解决这些问题的方法。展望未来,我们提出了EGANs的新研究方向,并建议它可能会改变人工智能(AI),并推动个性化内容生成,虚拟现实(VR)体验和医疗诊断方面的前沿应用。总之,它将为EGANs提供坚实的基础。由于GAN和EC这两种强大的范式的结合,它代表了人工智能领域的一个有希望的发展轨迹。它旨在应对挑战,这些挑战将导致数据合成和优化的新世界。
{"title":"Recent developments in evolutionary computation for generative adversarial networks: A comprehensive survey","authors":"Atifa Rafique ,&nbsp;Xue Yu ,&nbsp;Kashif Iqbal","doi":"10.1016/j.chemolab.2025.105587","DOIUrl":"10.1016/j.chemolab.2025.105587","url":null,"abstract":"<div><div>In recent years, evolutionary generative adversarial networks (EGANs) have been proposed as a emerging research area that merges the well-known original concept of generative adversarial networks (GAN) for generating realistic data and evolutionary computation (EC) techniques to optimize solutions by inspiration from nature. In this review paper, we delve into the synergetic relationship between EC and GAN with an emphasis on EGANs — an emerging direction that has the potential to spark a multitude of practical applications. To this end, we first introduce the key concepts of GANs and EC respectively in detail to illustrate their synergism for modeling novel data efficiently while keeping consistency with reality. Then we describe how EC techniques have been incorporated into these architectures to improve both performance and diversity. This paper presents a thorough analysis of the EGANs in various domains. In this perspective, EGANs have been proven to be very effective in various real-world problems like data scarcity as well as mode collapse and training instability. We also consider the limitations of EGANs and suggest methods for addressing them. For the future, we present new research directions for EGANs and suggest that it could potentially transform artificial intelligence (AI) as well as push forward cutting-edge applications in personalized content generation, virtual reality (VR) experiences, and medical diagnosis. In conclusion, it will provide a solid foundation for EGANs. It represents a promising trajectory for AI space due to a combination of two powerful paradigms, GAN and EC. It aims to handle the challenges which will result in enabling the new world in data synthesis and optimization.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"268 ","pages":"Article 105587"},"PeriodicalIF":3.8,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145615445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimization of caper bud drying using the DT_LSBOOST model: A predictive approach to improve quality and efficiency 利用DT_LSBOOST模型优化刺山柑芽干燥:提高质量和效率的预测方法
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-11-20 DOI: 10.1016/j.chemolab.2025.105585
Chafika Lakhdari , Hocine Remini , Samia Djellal , Meriem Adouane , Hichem Tahraoui , Abdeltif Amrane , Farid Dahmoune , Merve Yavuz-Düzgün , Elif Feyza Aydar , Evren Demircan , Zehra Mertdinç , Beraat Ozçelik , Nabil Kadri
Capparis spinosa L. buds undergo salting and drying to enhance their shelf life and organoleptic properties. This study evaluates the impact of four drying methods: oven drying (OD), vacuum drying (VD), freeze-drying (FD), and microwave drying (MD) on the physicochemical, antioxidant, and microbiological properties of dried caper buds. Salting reduced the initial moisture content from 508.50 % to 168.59 % (db), while drying further decreased it to approximately 9 %. Drying time varied significantly, with MD achieving the shortest duration (0.19–0.75h) and OD requiring the longest (reaching 49.66h). FD exhibited the highest energy consumption (60.77 kWh/kg), followed by VD, while OD and MD were the least energy-intensive (0.54–3.10 kWh/kg and 1.34–2.18 kWh/kg, respectively). FD preserved the most chlorophyll (193.63 μg/g DW) and total phenolic content (28.98 mgGAE/g DW), whereas MD at 200 W resulted in the lowest TPC (9.88 mgGAE/g DW). FD samples also showed superior antioxidant activities in both ABTS and FRAP assays. In contrast, OD and MD increased browning and degraded quality attributes. Multivariate analyses (PCA and clustering) highlighted FD as optimal for preserving quality, while MD was the most detrimental. Microbiological analysis confirmed that dried capers met food safety standards. A predictive model using Decision Tree coupled with Least Squares Boosting (DT_LSBOOST) achieved exceptional accuracy (R = 0.9999, RMSE = 0.0564, ESP = 0.2028, MAE = 0.0305), providing a reliable tool for optimizing drying parameters. Overall, freeze-drying emerged as the best method to retain nutritional and bioactive properties of capers, and the developed predictive model offers an innovative approach to enhancing caper processing efficiency.
辣椒芽经过腌制和干燥,以提高其保质期和感官特性。研究了四种干燥方法:烘箱干燥(OD)、真空干燥(VD)、冷冻干燥(FD)和微波干燥(MD)对干刺山柑芽的理化、抗氧化和微生物特性的影响。盐渍使其初始含水率从508.50%降低到168.59% (db),干燥使其进一步降低到约9%。干燥时间变化明显,MD最短(0.19-0.75h), OD最长(49.66h)。FD的能耗最高(60.77 kWh/kg), VD次之,OD和MD的能耗最低(分别为0.54 ~ 3.10 kWh/kg和1.34 ~ 2.18 kWh/kg)。FD保存了最多的叶绿素(193.63 μg/g DW)和总酚含量(28.98 mgGAE/g DW),而200 W的MD保存了最低的TPC (9.88 mgGAE/g DW)。FD样品在ABTS和FRAP检测中也显示出优异的抗氧化活性。相反,OD和MD增加了褐变和劣化的品质属性。多变量分析(PCA和聚类)强调FD对于保持质量是最佳的,而MD是最有害的。微生物分析证实,干刺山柑符合食品安全标准。采用决策树与最小二乘提升(DT_LSBOOST)相结合的预测模型取得了优异的准确性(R = 0.9999, RMSE = 0.0564, ESP = 0.2028, MAE = 0.0305),为优化干燥参数提供了可靠的工具。综上所述,冷冻干燥是保持酸豆营养和生物活性的最佳方法,所建立的预测模型为提高酸豆加工效率提供了一种创新的方法。
{"title":"Optimization of caper bud drying using the DT_LSBOOST model: A predictive approach to improve quality and efficiency","authors":"Chafika Lakhdari ,&nbsp;Hocine Remini ,&nbsp;Samia Djellal ,&nbsp;Meriem Adouane ,&nbsp;Hichem Tahraoui ,&nbsp;Abdeltif Amrane ,&nbsp;Farid Dahmoune ,&nbsp;Merve Yavuz-Düzgün ,&nbsp;Elif Feyza Aydar ,&nbsp;Evren Demircan ,&nbsp;Zehra Mertdinç ,&nbsp;Beraat Ozçelik ,&nbsp;Nabil Kadri","doi":"10.1016/j.chemolab.2025.105585","DOIUrl":"10.1016/j.chemolab.2025.105585","url":null,"abstract":"<div><div><em>Capparis spinosa</em> L. buds undergo salting and drying to enhance their shelf life and organoleptic properties. This study evaluates the impact of four drying methods: oven drying (OD), vacuum drying (VD), freeze-drying (FD), and microwave drying (MD) on the physicochemical, antioxidant, and microbiological properties of dried caper buds. Salting reduced the initial moisture content from 508.50 % to 168.59 % (db), while drying further decreased it to approximately 9 %. Drying time varied significantly, with MD achieving the shortest duration (0.19–0.75h) and OD requiring the longest (reaching 49.66h). FD exhibited the highest energy consumption (60.77 kWh/kg), followed by VD, while OD and MD were the least energy-intensive (0.54–3.10 kWh/kg and 1.34–2.18 kWh/kg, respectively). FD preserved the most chlorophyll (193.63 μg/g DW) and total phenolic content (28.98 mgGAE/g DW), whereas MD at 200 W resulted in the lowest TPC (9.88 mgGAE/g DW). FD samples also showed superior antioxidant activities in both ABTS and FRAP assays. In contrast, OD and MD increased browning and degraded quality attributes. Multivariate analyses (PCA and clustering) highlighted FD as optimal for preserving quality, while MD was the most detrimental. Microbiological analysis confirmed that dried capers met food safety standards. A predictive model using Decision Tree coupled with Least Squares Boosting (DT_LSBOOST) achieved exceptional accuracy (R = 0.9999, RMSE = 0.0564, ESP = 0.2028, MAE = 0.0305), providing a reliable tool for optimizing drying parameters. Overall, freeze-drying emerged as the best method to retain nutritional and bioactive properties of capers, and the developed predictive model offers an innovative approach to enhancing caper processing efficiency.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"268 ","pages":"Article 105585"},"PeriodicalIF":3.8,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145615444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High precision classification method for black tea: Deep learning combined with two-dimensional correlation spectroscopy 红茶高精度分类方法:深度学习与二维相关光谱相结合
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-11-20 DOI: 10.1016/j.chemolab.2025.105580
Long Liu , Yifan Wang , Bin Wang , Xiaoxuan Xu , Jing Xu
Tea is a widely popular beverage across the globe. However, its medicinal content and value vary from one species to another. As a result, consumers need a quick and efficient method to distinguish between species. This paper introduces a method for species classification using two-dimensional correlation spectroscopy (2DCOS) images combined with deep learning (DL) models. Initially, 345 thin-section samples of five different black teas were prepared, and their near-infrared spectroscopy (NIRS) data were obtained. From this preprocessed one-dimensional NIRS data, 8280 2DCOS contour images and contour fill images were generated. MobileNet model with various bottleneck residual blocks was constructed, and trained using these 2DCOS images as samples, achieving a classification accuracy of 100 %. The model testing results indicated that the optimal NIRS data preprocessing method and 2DCOS image format are Standard Normal Variate transformation (SNV) and contour fill image. Furthermore, the classification results of one-dimensional NIRS data, 2DOCS matrix data, and 2DCOS image data were compared, showing that the 2DCOS images provide higher classification accuracy. Finally, comparative experiments were conducted between the MobileNet model and other deep learning models, demonstrating that the MobileNet model has the advantages of fewer parameters, lower computational load, high accuracy, and fast convergence speed. Therefore, combining 2DCOS images with the MobileNet model for black tea classification is effective. This paper offers a promising approach for the identification of black tea species, with extensive potential applications in species classification.
茶是一种在全球范围内广受欢迎的饮料。然而,其药用成分和价值因物种而异。因此,消费者需要一种快速有效的方法来区分物种。介绍了一种基于二维相关光谱(2DCOS)图像与深度学习(DL)模型相结合的物种分类方法。首先,制备了5种不同红茶的345个薄片样品,并获得了它们的近红外光谱(NIRS)数据。利用预处理后的一维近红外光谱数据,生成8280幅2DCOS轮廓图像和轮廓填充图像。构建具有不同瓶颈残差块的MobileNet模型,并以这些2DCOS图像为样本进行训练,分类准确率达到100%。模型测试结果表明,最优的近红外光谱数据预处理方法和2DCOS图像格式为标准正态变量变换(SNV)和轮廓填充图像。此外,比较了一维NIRS数据、2DOCS矩阵数据和2DCOS图像数据的分类结果,发现2DCOS图像具有更高的分类精度。最后,将MobileNet模型与其他深度学习模型进行对比实验,结果表明MobileNet模型具有参数少、计算量小、精度高、收敛速度快等优点。因此,将2DCOS图像与MobileNet模型相结合进行红茶分类是有效的。本文为红茶的种类鉴定提供了一种有前景的方法,在物种分类中具有广泛的应用前景。
{"title":"High precision classification method for black tea: Deep learning combined with two-dimensional correlation spectroscopy","authors":"Long Liu ,&nbsp;Yifan Wang ,&nbsp;Bin Wang ,&nbsp;Xiaoxuan Xu ,&nbsp;Jing Xu","doi":"10.1016/j.chemolab.2025.105580","DOIUrl":"10.1016/j.chemolab.2025.105580","url":null,"abstract":"<div><div>Tea is a widely popular beverage across the globe. However, its medicinal content and value vary from one species to another. As a result, consumers need a quick and efficient method to distinguish between species. This paper introduces a method for species classification using two-dimensional correlation spectroscopy (2DCOS) images combined with deep learning (DL) models. Initially, 345 thin-section samples of five different black teas were prepared, and their near-infrared spectroscopy (NIRS) data were obtained. From this preprocessed one-dimensional NIRS data, 8280 2DCOS contour images and contour fill images were generated. MobileNet model with various bottleneck residual blocks was constructed, and trained using these 2DCOS images as samples, achieving a classification accuracy of 100 %. The model testing results indicated that the optimal NIRS data preprocessing method and 2DCOS image format are Standard Normal Variate transformation (SNV) and contour fill image. Furthermore, the classification results of one-dimensional NIRS data, 2DOCS matrix data, and 2DCOS image data were compared, showing that the 2DCOS images provide higher classification accuracy. Finally, comparative experiments were conducted between the MobileNet model and other deep learning models, demonstrating that the MobileNet model has the advantages of fewer parameters, lower computational load, high accuracy, and fast convergence speed. Therefore, combining 2DCOS images with the MobileNet model for black tea classification is effective. This paper offers a promising approach for the identification of black tea species, with extensive potential applications in species classification.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"268 ","pages":"Article 105580"},"PeriodicalIF":3.8,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145569576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Chemometrics and Intelligent Laboratory Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1