首页 > 最新文献

Chemometrics and Intelligent Laboratory Systems最新文献

英文 中文
Research on quality evaluation system and grade classification of Angelica dahurica based on artificial intelligence and multispectral technology 基于人工智能和多光谱技术的白芷质量评价体系及等级分类研究
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-12-02 DOI: 10.1016/j.chemolab.2025.105610
Wei Nie , Xulong Huang , Jin Pei , Chaoxiang Ren , Tao Zhou , Jinyu Du , Huajuan Jiang , HanYi Zhang , Xin Li , Juan Li , Yuhang Li , Yueying Hu , Zhiyu Hao
Angelica dahurica (AD) is both a widely used spice and a precious traditional Chinese medicine. Currently, its quality evaluation predominantly depends on traditional identification methods and physicochemical assessments, which are often subjective or time-consuming, thus limiting their suitability for rapid, non-destructive, and accurate quality evaluation. Therefore, this study constructed a quality evaluation system based on key criteria: shape, color, odour and texture. Experienced traditional medicine experts scored 611 samples according to this system, categorizing them into three quality grades. Imperatorin and isoimperatorin in 30 randomly selected batches were quantified by HPLC, revealing a positive correlation with quality grade and confirming the system's accuracy and reliability. Moreover, quality grading models were established by integrating multispectral imaging technology with artificial intelligence technologies such as CNN and Transformer. The Transformer model achieved the highest accuracy of 88.71 %. Overall, this study improves the objectivity and reproducibility of traditional identification methods. It also demonstrates that integrating artificial intelligence with multispectral imaging enables non-destructive, rapid, and precise classification of AD, offering a novel approach for quality control of medicinal materials.
白芷(Angelica dahurica)是一种用途广泛的香料,也是一种珍贵的中药。目前,其质量评价主要依靠传统的鉴定方法和物理化学评价方法,这些方法往往主观或耗时,从而限制了其对快速、无损、准确的质量评价的适用性。因此,本研究构建了一个以形状、颜色、气味和质地为主要标准的质量评价体系。经验丰富的传统医学专家根据该系统对611份样本进行评分,并将其分为三个质量等级。采用高效液相色谱法对随机选取的30个批次的欧前胡素和异欧前胡素进行定量分析,结果表明,欧前胡素与质量等级呈正相关,验证了体系的准确性和可靠性。将多光谱成像技术与CNN、Transformer等人工智能技术相结合,建立了质量分级模型。Transformer模型的准确率最高,达到了88.71%。总的来说,本研究提高了传统鉴定方法的客观性和可重复性。该研究还表明,将人工智能与多光谱成像相结合,可以实现AD的无损、快速、精确分类,为药材质量控制提供了一种新的方法。
{"title":"Research on quality evaluation system and grade classification of Angelica dahurica based on artificial intelligence and multispectral technology","authors":"Wei Nie ,&nbsp;Xulong Huang ,&nbsp;Jin Pei ,&nbsp;Chaoxiang Ren ,&nbsp;Tao Zhou ,&nbsp;Jinyu Du ,&nbsp;Huajuan Jiang ,&nbsp;HanYi Zhang ,&nbsp;Xin Li ,&nbsp;Juan Li ,&nbsp;Yuhang Li ,&nbsp;Yueying Hu ,&nbsp;Zhiyu Hao","doi":"10.1016/j.chemolab.2025.105610","DOIUrl":"10.1016/j.chemolab.2025.105610","url":null,"abstract":"<div><div><em>Angelica dahurica</em> (AD) is both a widely used spice and a precious traditional Chinese medicine. Currently, its quality evaluation predominantly depends on traditional identification methods and physicochemical assessments, which are often subjective or time-consuming, thus limiting their suitability for rapid, non-destructive, and accurate quality evaluation. Therefore, this study constructed a quality evaluation system based on key criteria: shape, color, odour and texture. Experienced traditional medicine experts scored 611 samples according to this system, categorizing them into three quality grades. Imperatorin and isoimperatorin in 30 randomly selected batches were quantified by HPLC, revealing a positive correlation with quality grade and confirming the system's accuracy and reliability. Moreover, quality grading models were established by integrating multispectral imaging technology with artificial intelligence technologies such as CNN and Transformer. The Transformer model achieved the highest accuracy of 88.71 %. Overall, this study improves the objectivity and reproducibility of traditional identification methods. It also demonstrates that integrating artificial intelligence with multispectral imaging enables non-destructive, rapid, and precise classification of AD, offering a novel approach for quality control of medicinal materials.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105610"},"PeriodicalIF":3.8,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145682912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal class-aware molecule language model for drug response prediction 用于药物反应预测的多模态类感知分子语言模型
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-12-02 DOI: 10.1016/j.chemolab.2025.105604
Yunfei Xia, Hui Yu, Xiaobo Zhou, Lichuan Gu, Qingyong Wang
Drug response prediction driven by artificial intelligence (AI) offers an efficient solution for accelerating the development of precision medicine and personalized treatment. However, existing AI methods are typically limited by high noise levels, heterogeneity, and limited modal data. These limitations decrease model performance and hinder the identification of critical biomarkers. Therefore, we propose a Multimodal Class-Aware Molecular Language (MCML) model for accurate drug response prediction. Specifically, MCML systematically integrates multimodal features of drugs and cell lines and establishes a crossmodal modeling mechanism to achieve deep fusion of multimodal information. Meanwhile, the model dynamically adjusts the contribution weights based on the class features and importance of the samples, effectively alleviating the noise interference inherent in multimodal data. Furthermore, MCML employs self-supervised learning for pre-training to capture potential molecular interaction patterns, enhancing its ability to adapt to data heterogeneity. Experiments performed on cross-scale multiomics datasets and single-cell transcriptomic data indicate that MCML significantly outperforms existing state-of-the-art models in RMSE and MAE scores. These case studies further demonstrate that the MCML model can effectively identify tumor microenvironment characteristics associated with drug resistance, demonstrating its ability to discover relevant biomarkers. Additionally, we performed an interpretability analysis of the model to investigate the impact of key features on the prediction results. This research establishes a new methodological paradigm for multimodal tumor data-driven drug response predictions and offers reliable computational tools for personalized cancer treatment decision making.
人工智能驱动的药物反应预测为加快精准医疗和个性化治疗的发展提供了有效的解决方案。然而,现有的人工智能方法通常受到高噪声水平、异质性和有限模态数据的限制。这些限制降低了模型的性能,阻碍了关键生物标志物的识别。因此,我们提出了一个多模态类感知分子语言(MCML)模型来准确预测药物反应。具体而言,MCML系统整合药物和细胞系的多模态特征,建立跨模态建模机制,实现多模态信息的深度融合。同时,该模型根据样本的类特征和重要性动态调整贡献权重,有效缓解了多模态数据固有的噪声干扰。此外,MCML采用自监督学习进行预训练,以捕获潜在的分子相互作用模式,增强其适应数据异质性的能力。在跨尺度多组学数据集和单细胞转录组学数据上进行的实验表明,MCML在RMSE和MAE评分方面明显优于现有的最先进模型。这些案例研究进一步表明,MCML模型可以有效识别与耐药相关的肿瘤微环境特征,显示其发现相关生物标志物的能力。此外,我们对模型进行了可解释性分析,以调查关键特征对预测结果的影响。本研究为多模式肿瘤数据驱动的药物反应预测建立了一种新的方法学范式,并为个性化癌症治疗决策提供了可靠的计算工具。
{"title":"Multimodal class-aware molecule language model for drug response prediction","authors":"Yunfei Xia,&nbsp;Hui Yu,&nbsp;Xiaobo Zhou,&nbsp;Lichuan Gu,&nbsp;Qingyong Wang","doi":"10.1016/j.chemolab.2025.105604","DOIUrl":"10.1016/j.chemolab.2025.105604","url":null,"abstract":"<div><div>Drug response prediction driven by artificial intelligence (AI) offers an efficient solution for accelerating the development of precision medicine and personalized treatment. However, existing AI methods are typically limited by high noise levels, heterogeneity, and limited modal data. These limitations decrease model performance and hinder the identification of critical biomarkers. Therefore, we propose a Multimodal Class-Aware Molecular Language (MCML) model for accurate drug response prediction. Specifically, MCML systematically integrates multimodal features of drugs and cell lines and establishes a crossmodal modeling mechanism to achieve deep fusion of multimodal information. Meanwhile, the model dynamically adjusts the contribution weights based on the class features and importance of the samples, effectively alleviating the noise interference inherent in multimodal data. Furthermore, MCML employs self-supervised learning for pre-training to capture potential molecular interaction patterns, enhancing its ability to adapt to data heterogeneity. Experiments performed on cross-scale multiomics datasets and single-cell transcriptomic data indicate that MCML significantly outperforms existing state-of-the-art models in RMSE and MAE scores. These case studies further demonstrate that the MCML model can effectively identify tumor microenvironment characteristics associated with drug resistance, demonstrating its ability to discover relevant biomarkers. Additionally, we performed an interpretability analysis of the model to investigate the impact of key features on the prediction results. This research establishes a new methodological paradigm for multimodal tumor data-driven drug response predictions and offers reliable computational tools for personalized cancer treatment decision making.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105604"},"PeriodicalIF":3.8,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145682907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LRADA: An adaptive global-local fault diagnosis via low-rank subspace representation with prior-constrained discriminative framework LRADA:基于低秩子空间表示和先验约束判别框架的自适应全局局部故障诊断
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-12-01 DOI: 10.1016/j.chemolab.2025.105608
Yi Luo , Jian Cheng , Si-Yu Chen , Lin-Hao Nie , Qian Cheng , Cheng-Shu Ye , Yuan Xu , Yang Zhao
Fault diagnosis of industrial processes is of great significance for reducing the risk of sensor damage and improving the safety and smooth operation of the plants. The data-driven fault diagnosis methods show promise in eliminating redundant signals captured by sensors, enabling superior anomaly detection performance. However, current data-driven approaches face many challenges in industrial plants signal processing and feature extraction, such as insufficient robustness, poor interpretability of the projection space, and inadequate representation of nonlinear local structures. To address these challenges, we propose a novel diagnostic framework, the low-rank approximation fusion discriminant analysis (LRADA) model, to enhance fault diagnosis for industrial processes. In the LRADA method, the prior information of the low-rank subspace was used to learn the global low-rank attributes of the original space. Then these attributes are embedded into the improved local linear discriminant analysis framework to enhance the discrimination between different classes. In addition, specific norm constraints are imposed on the projection space to facilitate sparse feature extraction and suppress noise interference. Finally, the effectiveness of the proposed diagnosis method is verified by experimental analysis on three benchmark data sets of industrial processes. The experimental results show the superiority of the proposed method in solving the above challenges and improving the fault diagnosis accuracy in industrial environments. LRADA is available at https://github.com/gitcodelist/LRADA.
工业过程故障诊断对于降低传感器损坏的风险,提高工厂的安全、平稳运行具有重要意义。数据驱动的故障诊断方法有望消除传感器捕获的冗余信号,从而实现卓越的异常检测性能。然而,目前的数据驱动方法在工业植物信号处理和特征提取中面临许多挑战,如鲁棒性不足,投影空间的可解释性差,以及非线性局部结构的不充分表示。为了解决这些问题,我们提出了一种新的诊断框架,即低秩近似融合判别分析(LRADA)模型,以增强工业过程的故障诊断能力。在LRADA方法中,利用低秩子空间的先验信息学习原始空间的全局低秩属性。然后将这些属性嵌入到改进的局部线性判别分析框架中,以增强不同类别之间的区分。此外,对投影空间施加了特定的范数约束,便于稀疏特征提取和抑制噪声干扰。最后,通过对三个工业过程基准数据集的实验分析,验证了所提诊断方法的有效性。实验结果表明,该方法在解决上述问题和提高工业环境下的故障诊断精度方面具有优越性。LRADA可在https://github.com/gitcodelist/LRADA上获得。
{"title":"LRADA: An adaptive global-local fault diagnosis via low-rank subspace representation with prior-constrained discriminative framework","authors":"Yi Luo ,&nbsp;Jian Cheng ,&nbsp;Si-Yu Chen ,&nbsp;Lin-Hao Nie ,&nbsp;Qian Cheng ,&nbsp;Cheng-Shu Ye ,&nbsp;Yuan Xu ,&nbsp;Yang Zhao","doi":"10.1016/j.chemolab.2025.105608","DOIUrl":"10.1016/j.chemolab.2025.105608","url":null,"abstract":"<div><div>Fault diagnosis of industrial processes is of great significance for reducing the risk of sensor damage and improving the safety and smooth operation of the plants. The data-driven fault diagnosis methods show promise in eliminating redundant signals captured by sensors, enabling superior anomaly detection performance. However, current data-driven approaches face many challenges in industrial plants signal processing and feature extraction, such as insufficient robustness, poor interpretability of the projection space, and inadequate representation of nonlinear local structures. To address these challenges, we propose a novel diagnostic framework, the low-rank approximation fusion discriminant analysis (LRADA) model, to enhance fault diagnosis for industrial processes. In the LRADA method, the prior information of the low-rank subspace was used to learn the global low-rank attributes of the original space. Then these attributes are embedded into the improved local linear discriminant analysis framework to enhance the discrimination between different classes. In addition, specific norm constraints are imposed on the projection space to facilitate sparse feature extraction and suppress noise interference. Finally, the effectiveness of the proposed diagnosis method is verified by experimental analysis on three benchmark data sets of industrial processes. The experimental results show the superiority of the proposed method in solving the above challenges and improving the fault diagnosis accuracy in industrial environments. LRADA is available at <span><span>https://github.com/gitcodelist/LRADA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105608"},"PeriodicalIF":3.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145682821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A semantic framework for drug-target affinity prediction using Mamba and graph convolutional networks for multimodal feature fusion 使用曼巴和图卷积网络进行多模态特征融合的药物靶点亲和力预测的语义框架
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-12-01 DOI: 10.1016/j.chemolab.2025.105601
Maoyuan Zhou , Jingjie He , Xingyu Liu , Junmin Huang , Jirui Zhang , Jiaxing Li , Xiaorui Huang , Qianjin Guo
Accurately assessing drug-target interaction (DTA) strength is pivotal in drug development. Enhancing DTA prediction precision necessitates effective protein representation methods. This study introduces MAGTSF-DTA, a multi-modal feature fusion semantic framework leveraging Mamba and graph convolutional networks (GCN). For molecules, atomic-level graph structures are generated from SMILES sequences, and the Mamba module is integrated with GCN to achieve efficient semantic learning. Furthermore, protein-protein interaction (PPI) networks are incorporated, and hierarchical approaches (HMANet & LMANet) are designed to integrate diverse protein features, enriching protein semantic representations. Experiments demonstrate that the proposed model significantly improves prediction accuracy on benchmark datasets compared to state-of-the-art techniques, validating the effectiveness of the Mamba architecture in DTA prediction and showcasing the model's generalization and interpretability.
准确评估药物-靶标相互作用(DTA)强度是药物开发的关键。提高DTA预测精度需要有效的蛋白质表示方法。本研究介绍了MAGTSF-DTA,一种利用曼巴和图卷积网络(GCN)的多模态特征融合语义框架。对于分子,从SMILES序列生成原子级图结构,并将Mamba模块与GCN相结合,实现高效的语义学习。此外,蛋白质-蛋白质相互作用(PPI)网络被纳入,分层方法(HMANet & LMANet)被设计用于整合各种蛋白质特征,丰富蛋白质语义表示。实验表明,与最先进的技术相比,该模型显著提高了基准数据集的预测精度,验证了曼巴架构在DTA预测中的有效性,并展示了模型的泛化和可解释性。
{"title":"A semantic framework for drug-target affinity prediction using Mamba and graph convolutional networks for multimodal feature fusion","authors":"Maoyuan Zhou ,&nbsp;Jingjie He ,&nbsp;Xingyu Liu ,&nbsp;Junmin Huang ,&nbsp;Jirui Zhang ,&nbsp;Jiaxing Li ,&nbsp;Xiaorui Huang ,&nbsp;Qianjin Guo","doi":"10.1016/j.chemolab.2025.105601","DOIUrl":"10.1016/j.chemolab.2025.105601","url":null,"abstract":"<div><div>Accurately assessing drug-target interaction (DTA) strength is pivotal in drug development. Enhancing DTA prediction precision necessitates effective protein representation methods. This study introduces MAGTSF-DTA, a multi-modal feature fusion semantic framework leveraging Mamba and graph convolutional networks (GCN). For molecules, atomic-level graph structures are generated from SMILES sequences, and the Mamba module is integrated with GCN to achieve efficient semantic learning. Furthermore, protein-protein interaction (PPI) networks are incorporated, and hierarchical approaches (HMANet &amp; LMANet) are designed to integrate diverse protein features, enriching protein semantic representations. Experiments demonstrate that the proposed model significantly improves prediction accuracy on benchmark datasets compared to state-of-the-art techniques, validating the effectiveness of the Mamba architecture in DTA prediction and showcasing the model's generalization and interpretability.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105601"},"PeriodicalIF":3.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145682820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrated monitoring and diagnosis of industrial processes based on causality synergistic and unique decomposition 基于因果、协同和独特分解的工业过程综合监测与诊断
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-11-30 DOI: 10.1016/j.chemolab.2025.105593
Shijie Zhu , Qi Zhang , Shuai Li , Yang Fu , Dongni Jia , Yigeng Wang
Causality mining plays a crucial role in monitoring complex industrial processes. However, incomplete extraction of quality related information may lead to a reduced monitoring accuracy rate for quality related faults, while uncertain causal relationships during root variable mining can further result in wrong fault diagnosis outcomes. To address these problems, we decompose the causal relationships between variables into synergistic and unique ones and further propose an integrated monitoring and diagnosis approach for industrial processes based on causality synergistic and unique decomposition. Firstly, we use Granger causality to preliminarily identify quality-related features and enhance the extraction of quality related features via the synergistic effect of causal relationships for addressing their complex interdependence. Secondly, due to the synergistic causality among variables between variable groups, it is necessary to capture and model their dynamic characteristics to ensure monitoring accuracy. We extend quality variable fault monitoring to process variables and further achieve integrated monitoring. Finally, we explore causal uniqueness to identify the fault root cause, which is key to achieving precise and rapid diagnosis in complex and uncertain industrial processes. The feasibility and effectiveness of the proposed method were validated in two scenarios: the benchmark Tennessee Eastman (TE) chemical process and an industrial case study of poor iron ore beneficiation.
因果关系挖掘在监测复杂工业过程中起着至关重要的作用。然而,质量相关信息的不完全提取会导致质量相关故障的监测准确率降低,而根变量挖掘过程中因果关系的不确定性会进一步导致错误的故障诊断结果。为了解决这些问题,我们将变量之间的因果关系分解为协同和唯一的因果关系,并进一步提出了基于因果协同和唯一分解的工业过程综合监测与诊断方法。首先,我们利用格兰杰因果关系对质量相关特征进行初步识别,并通过因果关系的协同效应来增强质量相关特征的提取,以解决它们之间复杂的相互依存关系。其次,由于变量组之间的变量之间存在协同因果关系,因此有必要对其动态特性进行捕捉和建模,以保证监测的准确性。将质量变量故障监测扩展到过程变量,进一步实现一体化监测。最后,探讨故障的因果唯一性,识别故障的根本原因,这是在复杂不确定的工业过程中实现精确快速诊断的关键。在田纳西伊士曼(Tennessee Eastman)化工流程和贫铁矿选矿工业案例两种场景下验证了该方法的可行性和有效性。
{"title":"Integrated monitoring and diagnosis of industrial processes based on causality synergistic and unique decomposition","authors":"Shijie Zhu ,&nbsp;Qi Zhang ,&nbsp;Shuai Li ,&nbsp;Yang Fu ,&nbsp;Dongni Jia ,&nbsp;Yigeng Wang","doi":"10.1016/j.chemolab.2025.105593","DOIUrl":"10.1016/j.chemolab.2025.105593","url":null,"abstract":"<div><div>Causality mining plays a crucial role in monitoring complex industrial processes. However, incomplete extraction of quality related information may lead to a reduced monitoring accuracy rate for quality related faults, while uncertain causal relationships during root variable mining can further result in wrong fault diagnosis outcomes. To address these problems, we decompose the causal relationships between variables into synergistic and unique ones and further propose an integrated monitoring and diagnosis approach for industrial processes based on causality synergistic and unique decomposition. Firstly, we use Granger causality to preliminarily identify quality-related features and enhance the extraction of quality related features via the synergistic effect of causal relationships for addressing their complex interdependence. Secondly, due to the synergistic causality among variables between variable groups, it is necessary to capture and model their dynamic characteristics to ensure monitoring accuracy. We extend quality variable fault monitoring to process variables and further achieve integrated monitoring. Finally, we explore causal uniqueness to identify the fault root cause, which is key to achieving precise and rapid diagnosis in complex and uncertain industrial processes. The feasibility and effectiveness of the proposed method were validated in two scenarios: the benchmark Tennessee Eastman (TE) chemical process and an industrial case study of poor iron ore beneficiation.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105593"},"PeriodicalIF":3.8,"publicationDate":"2025-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145682909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online monitoring of phytochemical dynamics in black tea processing using MIP-driven classifier models 利用mip驱动分类器模型在线监测红茶加工过程中的植物化学动态
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-11-30 DOI: 10.1016/j.chemolab.2025.105611
Debanjana Ghosh , Debangana Das , Shreya Nag , Runu Banerjee Roy
—Black tea processing involves variation in phytochemical constituents through multiple stages, with the tea quality index varying according to these biomarkers. In this treatise, modified classifier models were used to monitor two key biomarkers, catechin and epigallocatechin gallate (EGCG), throughout the five distinct stages of black tea processing for tea quality estimation. Catechin and EGCG selective molecularly imprinted polymer (MIP) electrodes were prepared for differential pulse voltammetry (DPV) responses at five different processing stages of black tea. The DPV responses were analyzed for the discrimination of the processing stages based on the content of catechin and EGCG using a stacked model incorporating four classification algorithms—Random forest, K-nearest neighbors, Gaussian Naive Bayes (NB), and Gradient boosting and an Artificial Neural Network (ANN) classifier model. The proposed models exhibited satisfactory performance in classifying five different stages of fermentation for four different tea samples, with accuracies of 98 % for catechin and 95 % for EGCG. Principal Component Analysis (PCA) plots show the capability of the sensors to identify each stage of tea processing as a distinct cluster. The sensor response also exhibited a consistent pattern of change in catechin and EGCG contents across various stages of tea processing.
-红茶加工过程涉及多个阶段植物化学成分的变化,茶叶质量指数根据这些生物标志物而变化。在这篇论文中,改进的分类器模型用于监测两种关键的生物标志物,儿茶素和表没食子儿茶素没食子酸酯(EGCG),在红茶加工的五个不同阶段进行茶叶质量评估。制备了儿茶素和EGCG选择性分子印迹聚合物(MIP)电极,用于红茶5个不同加工阶段的差分脉冲伏安(DPV)响应。采用随机森林、k近邻、高斯朴素贝叶斯(NB)和梯度增强四种分类算法和人工神经网络(ANN)分类器模型,对DPV响应进行了基于儿茶素和EGCG含量的加工阶段判别。所提出的模型在四种不同茶叶样品的五个不同发酵阶段分类中表现出令人满意的性能,儿茶素和EGCG的准确率分别为98%和95%。主成分分析(PCA)图显示了传感器将茶叶加工的每个阶段识别为一个不同集群的能力。在茶叶加工的不同阶段,传感器的反应也显示出儿茶素和EGCG含量的一致变化模式。
{"title":"Online monitoring of phytochemical dynamics in black tea processing using MIP-driven classifier models","authors":"Debanjana Ghosh ,&nbsp;Debangana Das ,&nbsp;Shreya Nag ,&nbsp;Runu Banerjee Roy","doi":"10.1016/j.chemolab.2025.105611","DOIUrl":"10.1016/j.chemolab.2025.105611","url":null,"abstract":"<div><div>—Black tea processing involves variation in phytochemical constituents through multiple stages, with the tea quality index varying according to these biomarkers. In this treatise, modified classifier models were used to monitor two key biomarkers, catechin and epigallocatechin gallate (EGCG), throughout the five distinct stages of black tea processing for tea quality estimation. Catechin and EGCG selective molecularly imprinted polymer (MIP) electrodes were prepared for differential pulse voltammetry (DPV) responses at five different processing stages of black tea. The DPV responses were analyzed for the discrimination of the processing stages based on the content of catechin and EGCG using a stacked model incorporating four classification algorithms—Random forest, K-nearest neighbors, Gaussian Naive Bayes (NB), and Gradient boosting and an Artificial Neural Network (ANN) classifier model. The proposed models exhibited satisfactory performance in classifying five different stages of fermentation for four different tea samples, with accuracies of 98 % for catechin and 95 % for EGCG. Principal Component Analysis (PCA) plots show the capability of the sensors to identify each stage of tea processing as a distinct cluster. The sensor response also exhibited a consistent pattern of change in catechin and EGCG contents across various stages of tea processing.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105611"},"PeriodicalIF":3.8,"publicationDate":"2025-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145682775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved variable reduction in Partial Least Squares modelling by global-minimum error reproducible Uninformative-Variable Elimination 基于全局最小误差可重复无信息变量消去的偏最小二乘模型中改进的变量约简
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-11-29 DOI: 10.1016/j.chemolab.2025.105603
Jan P.M. Andries , Gerjen H. Tinnevelt , Yvan Vander Heyden
The well-known Uninformative-Variable Elimination for Partial Least Squares, denoted as UVE-PLS, is not reproducible regarding the selected variables. Additionally, in UVE, variables are selected in the first minimum of the graph of the root mean squared error of cross validation (RMSECV) against the number of retained variables. This results mostly in rather large numbers of selected variables. Therefore, there is a need for a new and reproducible UVE method with better selective and preferably also better predictive abilities. Consequently, the Global-Minimum Error Reproducible Uninformative-Variable Elimination method, denoted as GME-RUVE, is proposed and tested.
In the GME-RUVE method, main characteristics of two existing methods, i.e. Jack-knife-based Partial Least Squares Regression (JK-PLSR) and Global-Minimum Error Uninformative-Variable Elimination (GME-UVE), are combined. JK-PLSR can be considered as a reproducible version of the original UVE method.
In GME-RUVE, as in the JK-PLSR method, no artificial random variables are added to the X matrix, and firstly the significance of the PLS regression coefficients is determined from jack-knifing. Secondly, as in the GME-UVE method, either the global minimum or the critical RMSECV is used for the selection of the variables. The performance of the new GME-RUVE method is investigated using four datasets with multivariate profiles, i.e. either simulated profiles, NIR spectra or theoretical molecular descriptor profiles, resulting in 12 profile-response (X-y) combinations.
The predictive performance of GME-RUVE, using the global RMSECV minimum and both the selective and predictive performances of GME-RUVE, using the critical RMSECV, are significantly better than both those of the JK-PLSR method, using the first local RMSECV minimum, and of the existing UVE method. The selective and predictive performances of the new GME-RUVE method are also much better than those of the existing GME-UVE method. Moreover, variables selected by the above GME-RUVE method have a chemical meaning.
众所周知的偏最小二乘的非信息变量消除,表示为UVE-PLS,对于所选变量是不可重复的。此外,在UVE中,在交叉验证均方根误差(RMSECV)与保留变量数量的图的第一个最小值中选择变量。这主要导致大量的选定变量。因此,需要一种新的、可重复的、具有更好的选择性和更好的预测能力的UVE方法。在此基础上,提出了全局最小误差可重复无信息变量消除方法GME-RUVE,并对其进行了验证。在GME-RUVE方法中,结合了基于杰克刀的偏最小二乘回归(JK-PLSR)和全局最小误差无信息变量消除(GME-UVE)两种现有方法的主要特点。JK-PLSR可以被认为是原始UVE方法的可重复版本。在GME-RUVE中,与JK-PLSR方法一样,没有在X矩阵中添加人工随机变量,首先通过jack- knife方法确定PLS回归系数的显著性。其次,与GME-UVE方法一样,要么使用全局最小值,要么使用临界RMSECV来选择变量。利用模拟谱、近红外光谱或理论分子描述子谱等4个多变量谱数据集研究了新型GME-RUVE方法的性能,得到了12种谱-响应(X-y)组合。使用全局RMSECV最小值的GME-RUVE方法的预测性能以及使用临界RMSECV的GME-RUVE方法的选择性和预测性能均明显优于使用第一个局部RMSECV最小值的JK-PLSR方法和现有的UVE方法。新的GME-RUVE方法的选择性和预测性能也比现有的GME-UVE方法好得多。而且,上述GME-RUVE方法所选取的变量具有化学意义。
{"title":"Improved variable reduction in Partial Least Squares modelling by global-minimum error reproducible Uninformative-Variable Elimination","authors":"Jan P.M. Andries ,&nbsp;Gerjen H. Tinnevelt ,&nbsp;Yvan Vander Heyden","doi":"10.1016/j.chemolab.2025.105603","DOIUrl":"10.1016/j.chemolab.2025.105603","url":null,"abstract":"<div><div>The well-known Uninformative-Variable Elimination for Partial Least Squares, denoted as UVE-PLS, is not reproducible regarding the selected variables. Additionally, in UVE, variables are selected in the first minimum of the graph of the root mean squared error of cross validation (<em>RMSECV</em>) against the number of retained variables. This results mostly in rather large numbers of selected variables. Therefore, there is a need for a new and reproducible UVE method with better selective and preferably also better predictive abilities. Consequently, the Global-Minimum Error Reproducible Uninformative-Variable Elimination method, denoted as GME-RUVE, is proposed and tested.</div><div>In the GME-RUVE method, main characteristics of two existing methods, i.e. Jack-knife-based Partial Least Squares Regression (JK-PLSR) and Global-Minimum Error Uninformative-Variable Elimination (GME-UVE), are combined. JK-PLSR can be considered as a reproducible version of the original UVE method.</div><div>In GME-RUVE, as in the JK-PLSR method, no artificial random variables are added to the <strong><em>X</em></strong> matrix, and firstly the significance of the PLS regression coefficients is determined from jack-knifing. Secondly, as in the GME-UVE method, either the <em>global minimum</em> or the <em>critical RMSECV</em> is used for the selection of the variables. The performance of the new GME-RUVE method is investigated using four datasets with multivariate profiles, i.e. either simulated profiles, NIR spectra or theoretical molecular descriptor profiles, resulting in 12 profile-response (<strong><em>X</em></strong>-<strong><em>y</em></strong>) combinations.</div><div>The predictive performance of GME-RUVE, using the <em>global RMSECV minimum</em> and both the selective and predictive performances of GME-RUVE, using the <em>critical RMSECV</em>, are significantly better than both those of the JK-PLSR method, using the <em>first local RMSECV minimum</em>, and of the existing UVE method. The selective and predictive performances of the new GME-RUVE method are also much better than those of the existing GME-UVE method. Moreover, variables selected by the above GME-RUVE method have a chemical meaning.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105603"},"PeriodicalIF":3.8,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145682819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multi-scale approach integrating hyperspectral system for tracing the origin of agricultural products 一种集成高光谱系统的农产品原产地溯源多尺度方法
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-11-27 DOI: 10.1016/j.chemolab.2025.105592
Yuqi Ren , He Wang , Chongbo Yin , Hong Men , Yan Shi , Jingjing Liu
Agricultural products of the same variety can differ in quality, appearance, and nutritional value due to variations in climate, soil, and other growth conditions. To support reliable and sustainable origin traceability, we propose a non-destructive framework using hyperspectral data. Spectral information for rice and peanut samples from multiple production regions was acquired using a GaiaSorter hyperspectral imaging system. This method can rapidly detect chemical bonds and functional groups, with differences in these features reflecting the overall microstructural quality of agricultural products from different origins. A novel Quadrangle Attention with Deformation (QAD) module was designed to enhance multi-scale feature learning. The module applies geometric transformations within local windows and incorporates relative positional encoding to capture multi-scale receptive-field information, thereby improving spectral-band relationships. By embedding the QAD module into a separable-convolution backbone, we developed the Quadrangle Attention with Deformation Network (QAD-Net) for precise origin identification. On two benchmark datasets, QAD-Net achieved state-of-the-art accuracy, reaching 99.66 ± 0.57 % on peanuts and 99.57 ± 0.65 % on rice, outperforming existing models. This work demonstrates the potential of QAD-Net as a fast, accurate, and non-destructive tool for hyperspectral origin traceability, with significant implications for on-site quality supervision, authenticity verification, and sustainable market regulation.
由于气候、土壤和其他生长条件的不同,同一品种的农产品在质量、外观和营养价值上可能有所不同。为了支持可靠和可持续的原产地追溯,我们提出了一个使用高光谱数据的非破坏性框架。利用GaiaSorter高光谱成像系统获取了多个产区水稻和花生样品的光谱信息。该方法可以快速检测到化学键和官能团,这些特征的差异反映了不同产地农产品的整体微观结构质量。为了增强多尺度特征学习能力,设计了一种新的变形四边形注意(QAD)模块。该模块在局部窗口内应用几何变换,并结合相对位置编码来捕获多尺度接收场信息,从而改善频谱带关系。通过将QAD模块嵌入到可分离卷积主干中,我们开发了具有变形网络的Quadrangle Attention with Deformation Network (QAD- net),用于精确的原点识别。在两个基准数据集上,QAD-Net达到了最先进的准确率,花生和大米的准确率分别达到99.66±0.57%和99.57±0.65%,优于现有模型。这项工作证明了QAD-Net作为一种快速、准确、无损的高光谱来源溯源工具的潜力,对现场质量监督、真实性验证和可持续市场监管具有重要意义。
{"title":"A multi-scale approach integrating hyperspectral system for tracing the origin of agricultural products","authors":"Yuqi Ren ,&nbsp;He Wang ,&nbsp;Chongbo Yin ,&nbsp;Hong Men ,&nbsp;Yan Shi ,&nbsp;Jingjing Liu","doi":"10.1016/j.chemolab.2025.105592","DOIUrl":"10.1016/j.chemolab.2025.105592","url":null,"abstract":"<div><div>Agricultural products of the same variety can differ in quality, appearance, and nutritional value due to variations in climate, soil, and other growth conditions. To support reliable and sustainable origin traceability, we propose a non-destructive framework using hyperspectral data. Spectral information for rice and peanut samples from multiple production regions was acquired using a GaiaSorter hyperspectral imaging system. This method can rapidly detect chemical bonds and functional groups, with differences in these features reflecting the overall microstructural quality of agricultural products from different origins. A novel Quadrangle Attention with Deformation (QAD) module was designed to enhance multi-scale feature learning. The module applies geometric transformations within local windows and incorporates relative positional encoding to capture multi-scale receptive-field information, thereby improving spectral-band relationships. By embedding the QAD module into a separable-convolution backbone, we developed the Quadrangle Attention with Deformation Network (QAD-Net) for precise origin identification. On two benchmark datasets, QAD-Net achieved state-of-the-art accuracy, reaching 99.66 ± 0.57 % on peanuts and 99.57 ± 0.65 % on rice, outperforming existing models. This work demonstrates the potential of QAD-Net as a fast, accurate, and non-destructive tool for hyperspectral origin traceability, with significant implications for on-site quality supervision, authenticity verification, and sustainable market regulation.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105592"},"PeriodicalIF":3.8,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145617036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explainable machine learning enables robust evaluation of extracted ion chromatograms in LC–MS metabolomics 可解释的机器学习能够在LC-MS代谢组学中对提取的离子色谱进行稳健的评估
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-11-26 DOI: 10.1016/j.chemolab.2025.105591
Juehong Dai , Liheng Dong , Jingjing Xu , Lingli Deng , Lei Guo , Jiyang Dong
Reliable evaluation of extracted ion chromatograms (EICs) remains a persistent challenge in LC–MS metabolomics, as inaccuracies in peak identification can profoundly impact subsequent data analysis and interpretation. While recent deep learning approaches show promise, their computational burden, limited generalizability, and lack of interpretability hinder broad adoption in routine analytical workflows. To address these limitations, we introduce EXACT-EIC (EXplainable Assessment of Chromatogram qualiTy for EICs), a lightweight, explainable machine learning framework. EXACT-EIC employs a thoughtfully designed 34 handcrafted features to perform two critical tasks: effective binary classification of EICs (peak vs. noise) and quantitative quality scoring. Benchmarking on curated in-house and public testing set demonstrated that EXACT-EIC achieved 95.2 % accuracy and 98.1 % recall for classification. For quantitative assessment, it attained a mean absolute error of 0.70 on a 1–10 expert-assigned quality scale. These results consistently outperformed state-of-the-art deep learning methods including PeakOnly and QuanFormer. Furthermore, Shapley Additive exPlanations (SHAP) analysis quantified the contribution of key chromatographic features (e.g., apex-boundary ratio, distribution entropy) to model predictions, offering transparent mechanistic insights absent in "black-box" architectures. By combining robustness, interpretability, and computational efficiency, EXACT-EIC facilitates reliable EIC evaluation across diverse platforms and experimental conditions. It provides a practical, deployable solution for automated quality control and confident metabolite annotation, addressing a critical need in untargeted LC–MS metabolomics workflows.
在LC-MS代谢组学中,可靠地评估提取的离子色谱(EICs)仍然是一个持续的挑战,因为峰识别的不准确性会严重影响随后的数据分析和解释。虽然最近的深度学习方法显示出希望,但它们的计算负担、有限的通用性和缺乏可解释性阻碍了在常规分析工作流程中的广泛采用。为了解决这些限制,我们引入了EXACT-EIC (EICs的可解释色谱质量评估),这是一个轻量级的,可解释的机器学习框架。EXACT-EIC采用精心设计的34个手工功能来执行两个关键任务:有效的eic二进制分类(峰值与噪声)和定量质量评分。对策划的内部和公共测试集的基准测试表明,EXACT-EIC在分类方面达到了95.2%的准确率和98.1%的召回率。对于定量评估,它在1-10专家分配的质量量表上达到了0.70的平均绝对误差。这些结果始终优于最先进的深度学习方法,包括PeakOnly和QuanFormer。此外,Shapley加性解释(SHAP)分析量化了关键色谱特征(如顶点边界比、分布熵)对模型预测的贡献,提供了“黑箱”架构中缺乏的透明机制见解。通过结合鲁棒性、可解释性和计算效率,EXACT-EIC有助于在不同平台和实验条件下进行可靠的EIC评估。它提供了一个实用的、可部署的解决方案,用于自动化质量控制和自信的代谢物注释,解决了非靶向LC-MS代谢组学工作流程中的关键需求。
{"title":"Explainable machine learning enables robust evaluation of extracted ion chromatograms in LC–MS metabolomics","authors":"Juehong Dai ,&nbsp;Liheng Dong ,&nbsp;Jingjing Xu ,&nbsp;Lingli Deng ,&nbsp;Lei Guo ,&nbsp;Jiyang Dong","doi":"10.1016/j.chemolab.2025.105591","DOIUrl":"10.1016/j.chemolab.2025.105591","url":null,"abstract":"<div><div>Reliable evaluation of extracted ion chromatograms (EICs) remains a persistent challenge in LC–MS metabolomics, as inaccuracies in peak identification can profoundly impact subsequent data analysis and interpretation. While recent deep learning approaches show promise, their computational burden, limited generalizability, and lack of interpretability hinder broad adoption in routine analytical workflows. To address these limitations, we introduce EXACT-EIC (EXplainable Assessment of Chromatogram qualiTy for EICs), a lightweight, explainable machine learning framework. EXACT-EIC employs a thoughtfully designed 34 handcrafted features to perform two critical tasks: effective binary classification of EICs (peak vs. noise) and quantitative quality scoring. Benchmarking on curated in-house and public testing set demonstrated that EXACT-EIC achieved 95.2 % accuracy and 98.1 % recall for classification. For quantitative assessment, it attained a mean absolute error of 0.70 on a 1–10 expert-assigned quality scale. These results consistently outperformed state-of-the-art deep learning methods including PeakOnly and QuanFormer. Furthermore, Shapley Additive exPlanations (SHAP) analysis quantified the contribution of key chromatographic features (e.g., apex-boundary ratio, distribution entropy) to model predictions, offering transparent mechanistic insights absent in \"black-box\" architectures. By combining robustness, interpretability, and computational efficiency, EXACT-EIC facilitates reliable EIC evaluation across diverse platforms and experimental conditions. It provides a practical, deployable solution for automated quality control and confident metabolite annotation, addressing a critical need in untargeted LC–MS metabolomics workflows.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105591"},"PeriodicalIF":3.8,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145610607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A classification model for early detection of breast cancer by Raman spectroscopy based on categorical embedding transformer 基于分类嵌入变压器的乳腺癌早期检测拉曼光谱分类模型
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-11-24 DOI: 10.1016/j.chemolab.2025.105589
Chaoyuan Hou , Fei Xie , Guohua Wu , Wenting Yu , Houpu Yang , Liu Yang , Xuewen Long , Longfei Yin , Shu Wang
At present, Raman spectroscopy combined with deep learning has been widely used in the field of disease screening. Transformer is an important architecture for deep learning and has excelled in several areas with technologies such as its self-attention mechanism. However, as an architecture originally designed for the field of natural language processing, Transformer has disadvantages such as high computational complexity and easy overfitting in small data sets when processing spectral data. In this study, we propose a spectral classification model called Categorical Embedding Transformer (CET) and apply it to the screening of breast cancer and ductal carcinoma in situ combined with Raman spectroscopy. The core principle of CET model is to embed class labels to fixed dimensional vectors and update them as learnable parameters during training. The CET model also removes the positional encoding in transformer encoder and the initial linear layer used for dimensionality reduction or dimensionality enhancement, and retains the structure used for feature extraction and dimensionality reduction of spectral data. The ability of feature extraction and dimensionality reduction of spectral data is retained while the computational complexity is reduced. Finally, the dot product is used to calculate the similarity between the class vector and the spectrum after dimensionality reduction, and the cross entropy loss function is used to maximize the dot product similarity of the real class during training. The model we built achieved 100 % accuracy on the validation set and 98.2 % accuracy on the unknown test set, which is better than other compared models.
目前,拉曼光谱与深度学习相结合已广泛应用于疾病筛查领域。Transformer是一种重要的深度学习架构,在一些领域表现出色,比如它的自关注机制。然而,作为一种最初为自然语言处理领域设计的架构,Transformer在处理光谱数据时存在计算复杂度高、小数据集容易过拟合等缺点。在本研究中,我们提出了一种称为分类嵌入变压器(CET)的光谱分类模型,并结合拉曼光谱将其应用于乳腺癌和导管原位癌的筛查。CET模型的核心原理是将类标签嵌入到固定维度的向量中,并在训练过程中更新为可学习的参数。CET模型还去掉了变压器编码器中的位置编码和用于降维或增强的初始线性层,保留了用于光谱数据特征提取和降维的结构。在降低计算复杂度的同时,保留了光谱数据的特征提取和降维能力。最后,利用点积计算降维后的类向量与谱的相似度,并利用交叉熵损失函数在训练过程中最大化真实类的点积相似度。我们建立的模型在验证集上的准确率达到100%,在未知测试集上的准确率达到98.2%,优于其他比较模型。
{"title":"A classification model for early detection of breast cancer by Raman spectroscopy based on categorical embedding transformer","authors":"Chaoyuan Hou ,&nbsp;Fei Xie ,&nbsp;Guohua Wu ,&nbsp;Wenting Yu ,&nbsp;Houpu Yang ,&nbsp;Liu Yang ,&nbsp;Xuewen Long ,&nbsp;Longfei Yin ,&nbsp;Shu Wang","doi":"10.1016/j.chemolab.2025.105589","DOIUrl":"10.1016/j.chemolab.2025.105589","url":null,"abstract":"<div><div>At present, Raman spectroscopy combined with deep learning has been widely used in the field of disease screening. Transformer is an important architecture for deep learning and has excelled in several areas with technologies such as its self-attention mechanism. However, as an architecture originally designed for the field of natural language processing, Transformer has disadvantages such as high computational complexity and easy overfitting in small data sets when processing spectral data. In this study, we propose a spectral classification model called Categorical Embedding Transformer (CET) and apply it to the screening of breast cancer and ductal carcinoma in situ combined with Raman spectroscopy. The core principle of CET model is to embed class labels to fixed dimensional vectors and update them as learnable parameters during training. The CET model also removes the positional encoding in transformer encoder and the initial linear layer used for dimensionality reduction or dimensionality enhancement, and retains the structure used for feature extraction and dimensionality reduction of spectral data. The ability of feature extraction and dimensionality reduction of spectral data is retained while the computational complexity is reduced. Finally, the dot product is used to calculate the similarity between the class vector and the spectrum after dimensionality reduction, and the cross entropy loss function is used to maximize the dot product similarity of the real class during training. The model we built achieved 100 % accuracy on the validation set and 98.2 % accuracy on the unknown test set, which is better than other compared models.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"268 ","pages":"Article 105589"},"PeriodicalIF":3.8,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145614940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Chemometrics and Intelligent Laboratory Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1