首页 > 最新文献

Chemometrics and Intelligent Laboratory Systems最新文献

英文 中文
Decentralized federated learning enables privacy-preserving NIR spectroscopy calibration: A proof-of-concept study 分散的联邦学习使保护隐私的近红外光谱校准成为可能:概念验证研究
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-11-18 DOI: 10.1016/j.chemolab.2025.105583
Yuan-yuan Chen
Near-infrared (NIR) spectroscopy is a key analytical tool across industries, providing fast, non-destructive measurements. However, traditional centralized models face key challenges regarding data privacy, instrument heterogeneity, and limited inter-institutional collaboration. We present a decentralized federated learning (DFL) system for NIR spectroscopy that enables institutions to collaboratively train accurate models without sharing raw data. The proposed system combines standardized spectral preprocessing with lightweight communication protocols to achieve modeling efficiency and data confidentiality. Extensive experiments were conducted on augmented Corn and Gasoline datasets using PLSR, SVR, and 1D-CNN models. In our simulations, we modeled a network of 30 clients communicating via a ring topology and applied FedProx regularization (μ = 0.1). The proposed DFL system produces predictions within 5–8 % of centralized results, while its architecture inherently offers improved scalability, fault tolerance, and privacy protection. The combination of FedProx and model personalization preserves training stability under non-IID data conditions, recovering 20 % of lost accuracy. In cross-instrument scenarios, the DFL approach outperforms both local-only and standard centralized FL models, reducing prediction errors by up to 52 % and showing strong generalization to new devices. While DFL requires more training rounds, system efficiency analysis shows its total communication cost is 25 % lower than centralized FL. Our research indicates DFL as a promising and practical approach for NIR spectroscopy, offering privacy, scalability, and generalizability for real-world, multi-party deployments with heterogeneous devices. However, performance can decline under extreme data heterogeneity, highlighting the need for further enhancements in model personalization.
近红外(NIR)光谱是跨行业的关键分析工具,提供快速,非破坏性的测量。然而,传统的集中式模型面临着数据隐私、工具异质性和有限的机构间协作方面的关键挑战。我们提出了一种用于近红外光谱的分散联邦学习(DFL)系统,该系统使机构能够在不共享原始数据的情况下协作训练准确的模型。该系统将标准化的频谱预处理与轻量级通信协议相结合,以实现建模效率和数据保密性。使用PLSR、SVR和1D-CNN模型对增强的玉米和汽油数据集进行了广泛的实验。在我们的模拟中,我们模拟了一个由30个客户端通过环形拓扑进行通信的网络,并应用FedProx正则化(μ = 0.1)。所提出的DFL系统在集中结果的5 - 8%内产生预测,而其架构固有地提供了改进的可伸缩性、容错性和隐私保护。FedProx和模型个性化的结合在非iid数据条件下保持了训练的稳定性,恢复了20%的准确度损失。在跨仪器场景中,DFL方法优于局部和标准集中式FL模型,将预测误差降低了52%,并显示出对新设备的强泛化。虽然DFL需要更多的训练轮次,但系统效率分析表明,它的总通信成本比集中式FL低25%。我们的研究表明,DFL是一种有前途和实用的近红外光谱方法,为现实世界中使用异构设备的多方部署提供隐私、可扩展性和通用性。然而,在极端的数据异构情况下,性能可能会下降,这突出了进一步增强模型个性化的必要性。
{"title":"Decentralized federated learning enables privacy-preserving NIR spectroscopy calibration: A proof-of-concept study","authors":"Yuan-yuan Chen","doi":"10.1016/j.chemolab.2025.105583","DOIUrl":"10.1016/j.chemolab.2025.105583","url":null,"abstract":"<div><div>Near-infrared (NIR) spectroscopy is a key analytical tool across industries, providing fast, non-destructive measurements. However, traditional centralized models face key challenges regarding data privacy, instrument heterogeneity, and limited inter-institutional collaboration. We present a decentralized federated learning (DFL) system for NIR spectroscopy that enables institutions to collaboratively train accurate models without sharing raw data. The proposed system combines standardized spectral preprocessing with lightweight communication protocols to achieve modeling efficiency and data confidentiality. Extensive experiments were conducted on augmented Corn and Gasoline datasets using PLSR, SVR, and 1D-CNN models. In our simulations, we modeled a network of 30 clients communicating via a ring topology and applied FedProx regularization (μ = 0.1). The proposed DFL system produces predictions within 5–8 % of centralized results, while its architecture inherently offers improved scalability, fault tolerance, and privacy protection. The combination of FedProx and model personalization preserves training stability under non-IID data conditions, recovering 20 % of lost accuracy. In cross-instrument scenarios, the DFL approach outperforms both local-only and standard centralized FL models, reducing prediction errors by up to 52 % and showing strong generalization to new devices. While DFL requires more training rounds, system efficiency analysis shows its total communication cost is 25 % lower than centralized FL. Our research indicates DFL as a promising and practical approach for NIR spectroscopy, offering privacy, scalability, and generalizability for real-world, multi-party deployments with heterogeneous devices. However, performance can decline under extreme data heterogeneity, highlighting the need for further enhancements in model personalization.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"268 ","pages":"Article 105583"},"PeriodicalIF":3.8,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145569655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advanced hyperparameter optimization for lung cancer detection using DenseBeetle network 基于DenseBeetle网络的肺癌检测高级超参数优化
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-11-18 DOI: 10.1016/j.chemolab.2025.105584
Jyoti Kumari , Sapna Sinha , Laxman Singh
Lung cancer remains a leading cause of cancer-related mortality, underscoring the urgent need for accurate and early detection to improve patient outcomes. However, current detection systems often struggle with issues like elevated false-positive rates and insufficient feature extraction. These challenges largely stem from the visual resemblance between nodules and nearby tissues, as well as the inability of conventional models to effectively capture the complex features of pulmonary nodules. This research presents a deep learning-based approach for identifying lung nodules in CT images. The framework incorporates advanced preprocessing steps such as Gaussian filtering and Contrast Limited Adaptive Histogram Equalization to enhance image sharpness and overall visual quality. A Residual Pyramid Attention-Enhanced DenseNet201, integrated with SE and CBAM modules, is used for effective feature extraction, while a sigmoid function supports binary classification. Hyperparameter tuning is performed using a novel optimizer based on Latin Hypercube Sampling and Mean Differential Variation. Evaluated on LUNA16 dataset with 888 CT scans, the model reached 98.7 % accuracy, 99.2 % sensitivity, and a 95.38 % F1-score on the test set. The framework significantly reduces false positives and demonstrates strong generalization for clinical lung cancer identification.
肺癌仍然是癌症相关死亡的主要原因,强调迫切需要准确和早期发现以改善患者的预后。然而,目前的检测系统经常面临假阳性率升高和特征提取不足等问题。这些挑战主要源于结节和附近组织之间的视觉相似性,以及传统模型无法有效捕获肺结节的复杂特征。本研究提出了一种基于深度学习的方法来识别CT图像中的肺结节。该框架结合了先进的预处理步骤,如高斯滤波和对比度有限的自适应直方图均衡化,以增强图像清晰度和整体视觉质量。残差金字塔注意力增强的DenseNet201集成了SE和CBAM模块,用于有效的特征提取,而sigmoid函数支持二元分类。采用基于拉丁超立方采样和均值微分变异的优化器进行超参数调优。在LUNA16数据集上对888次CT扫描进行评估,该模型在测试集上达到98.7%的准确率,99.2%的灵敏度和95.38%的f1得分。该框架显著减少假阳性,对临床肺癌鉴定具有很强的通用性。
{"title":"Advanced hyperparameter optimization for lung cancer detection using DenseBeetle network","authors":"Jyoti Kumari ,&nbsp;Sapna Sinha ,&nbsp;Laxman Singh","doi":"10.1016/j.chemolab.2025.105584","DOIUrl":"10.1016/j.chemolab.2025.105584","url":null,"abstract":"<div><div>Lung cancer remains a leading cause of cancer-related mortality, underscoring the urgent need for accurate and early detection to improve patient outcomes. However, current detection systems often struggle with issues like elevated false-positive rates and insufficient feature extraction. These challenges largely stem from the visual resemblance between nodules and nearby tissues, as well as the inability of conventional models to effectively capture the complex features of pulmonary nodules. This research presents a deep learning-based approach for identifying lung nodules in CT images. The framework incorporates advanced preprocessing steps such as Gaussian filtering and Contrast Limited Adaptive Histogram Equalization to enhance image sharpness and overall visual quality. A Residual Pyramid Attention-Enhanced DenseNet201, integrated with SE and CBAM modules, is used for effective feature extraction, while a sigmoid function supports binary classification. Hyperparameter tuning is performed using a novel optimizer based on Latin Hypercube Sampling and Mean Differential Variation. Evaluated on LUNA16 dataset with 888 CT scans, the model reached 98.7 % accuracy, 99.2 % sensitivity, and a 95.38 % F1-score on the test set. The framework significantly reduces false positives and demonstrates strong generalization for clinical lung cancer identification.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"268 ","pages":"Article 105584"},"PeriodicalIF":3.8,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145569575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature extraction using differential amplification singular value decomposition in Vis–NIR spectroscopy: Application to cigarette brand identification 基于差分放大奇异值分解的近红外光谱特征提取:在香烟品牌识别中的应用
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-11-18 DOI: 10.1016/j.chemolab.2025.105579
Biao Tang , Chengbo Yang , Jianchun Li , Jingjun Wu
Rapid and accurate identification of cigarette brands is crucial for combating counterfeiting and protecting tax revenue. Vis–NIR spectroscopy combined with machine learning is a promising identification method. Nevertheless, redundant information abounds in high-dimensional spectral data, which affects classification accuracy. To address this challenge, this study proposes a novel feature extraction method ― Differential amplification singular value decomposition (DA-SVD). This method optimizes the feature projection direction by amplifying both the individual differences among samples and the overall differences between classes, thereby achieving effective dimensionality reduction of spectral data. By applying DA-SVD, the classification accuracy of KNN, SVM, and RF models on the test set significantly increased from 36 %, 34 %, and 30 % (based on the original data) to 98 % for all models, with precision, sensitivity, and F1 score reaching 97.86 %, 98.14 %, and 97.86 %, respectively, and all outperforming conventional feature extraction methods such as LDA, SVD, and PCA. The experimental results further demonstrated that DA-SVD could achieve satisfactory classification performance without additional preprocessing steps (outlier detection and spectral denoising). In addition, the 10-fold cross-validation results confirmed the stability of the DA-SVD method, and validation on public datasets further demonstrated its generalization ability and applicability. Overall, DA-SVD provides an efficient and robust feature extraction strategy that, when combined with machine learning, enables reliable cigarette brand identification and has broad potential for other spectroscopic applications.
快速准确地识别卷烟品牌对于打击假冒和保护税收至关重要。近红外光谱与机器学习相结合是一种很有前途的识别方法。然而,高维光谱数据中存在大量冗余信息,影响了分类精度。为了解决这一挑战,本研究提出了一种新的特征提取方法-差分放大奇异值分解(DA-SVD)。该方法通过放大样本之间的个体差异和类之间的整体差异来优化特征投影方向,从而实现光谱数据的有效降维。通过应用DA-SVD, KNN、SVM和RF模型在测试集上的分类准确率从36%、34%和30%(基于原始数据)显著提高到98%,精度、灵敏度和F1得分分别达到97.86%、98.14%和97.86%,均优于LDA、SVD和PCA等传统特征提取方法。实验结果进一步表明,DA-SVD无需额外的预处理步骤(离群点检测和光谱去噪)就能获得满意的分类性能。此外,10倍交叉验证结果证实了DA-SVD方法的稳定性,在公共数据集上的验证进一步证明了其泛化能力和适用性。总体而言,DA-SVD提供了一种高效且强大的特征提取策略,当与机器学习相结合时,可以实现可靠的香烟品牌识别,并在其他光谱应用中具有广泛的潜力。
{"title":"Feature extraction using differential amplification singular value decomposition in Vis–NIR spectroscopy: Application to cigarette brand identification","authors":"Biao Tang ,&nbsp;Chengbo Yang ,&nbsp;Jianchun Li ,&nbsp;Jingjun Wu","doi":"10.1016/j.chemolab.2025.105579","DOIUrl":"10.1016/j.chemolab.2025.105579","url":null,"abstract":"<div><div>Rapid and accurate identification of cigarette brands is crucial for combating counterfeiting and protecting tax revenue. Vis–NIR spectroscopy combined with machine learning is a promising identification method. Nevertheless, redundant information abounds in high-dimensional spectral data, which affects classification accuracy. To address this challenge, this study proposes a novel feature extraction method ― Differential amplification singular value decomposition (DA-SVD). This method optimizes the feature projection direction by amplifying both the individual differences among samples and the overall differences between classes, thereby achieving effective dimensionality reduction of spectral data. By applying DA-SVD, the classification accuracy of KNN, SVM, and RF models on the test set significantly increased from 36 %, 34 %, and 30 % (based on the original data) to 98 % for all models, with precision, sensitivity, and F1 score reaching 97.86 %, 98.14 %, and 97.86 %, respectively, and all outperforming conventional feature extraction methods such as LDA, SVD, and PCA. The experimental results further demonstrated that DA-SVD could achieve satisfactory classification performance without additional preprocessing steps (outlier detection and spectral denoising). In addition, the 10-fold cross-validation results confirmed the stability of the DA-SVD method, and validation on public datasets further demonstrated its generalization ability and applicability. Overall, DA-SVD provides an efficient and robust feature extraction strategy that, when combined with machine learning, enables reliable cigarette brand identification and has broad potential for other spectroscopic applications.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"268 ","pages":"Article 105579"},"PeriodicalIF":3.8,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145569653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Environment aging analysis of animal bloodstains with ATR-FTIR and CNN 动物血迹的ATR-FTIR和CNN环境老化分析
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-11-14 DOI: 10.1016/j.chemolab.2025.105576
Chun-Ta Wei , Zexin Shen , Wenbin Luo , Jingyi Zhao , Tingting Yin , Kaining Cheng , Miao Zhang
Bloodstain analysis is a critical component of forensic science, particularly for determining the time of deposition and understanding the effects of environmental conditions on evidence. This study presents an innovative bloodstains environment aging model, which integrates attenuated total reflection fourier transform infrared spectroscopy (ATR-FTIR) with a convolutional neural network (CNN) optimized using the black-winged kite algorithm. Bloodstains from common sources (pig, cow, and chicken) were analyzed under varying environmental conditions, including temperature fluctuations (0 °C, 40 °C, 100 °C) and simulated sunlight exposure, across multiple aging periods (1, 2, 4, 8 days). Spectral data obtained through ATR-FTIR scanning served as the input for the optimized CNN, enabling precise differentiation and classification of bloodstains based on aging and environmental factors. The model achieved high predictive accuracy, with 97.86 % for pig blood, 95.47 % for cow blood, and 97.15 % for chicken blood under 0 °C conditions, demonstrating its robustness and reliability in forensic applications. Additionally, this research highlights the potential for integrating spectroscopic data with advanced deep learning techniques to enhance forensic methodologies. By improving accuracy, accessibility, and cost-effectiveness, this work represents a significant advancement in bloodstain analysis and forensic science.
血迹分析是法医科学的一个重要组成部分,特别是在确定沉积时间和了解环境条件对证据的影响方面。本研究提出了一种创新的血迹环境老化模型,该模型将衰减全反射傅立叶变换红外光谱(ATR-FTIR)与使用黑翼风筝算法优化的卷积神经网络(CNN)相结合。研究人员在不同的环境条件下分析了常见来源(猪、牛和鸡)的血迹,包括温度波动(0°C、40°C、100°C)和模拟阳光照射,并在多个老化期(1、2、4、8天)进行了分析。通过ATR-FTIR扫描获得的光谱数据作为优化后的CNN的输入,可以根据年龄和环境因素对血迹进行精确的区分和分类。在0°C条件下,该模型对猪血、牛血和鸡血的预测准确率分别为97.86%、95.47%和97.15%,显示了其在法医应用中的鲁棒性和可靠性。此外,本研究强调了将光谱数据与先进的深度学习技术相结合以增强法医方法的潜力。通过提高准确性、可及性和成本效益,这项工作代表了血迹分析和法医科学的重大进步。
{"title":"Environment aging analysis of animal bloodstains with ATR-FTIR and CNN","authors":"Chun-Ta Wei ,&nbsp;Zexin Shen ,&nbsp;Wenbin Luo ,&nbsp;Jingyi Zhao ,&nbsp;Tingting Yin ,&nbsp;Kaining Cheng ,&nbsp;Miao Zhang","doi":"10.1016/j.chemolab.2025.105576","DOIUrl":"10.1016/j.chemolab.2025.105576","url":null,"abstract":"<div><div>Bloodstain analysis is a critical component of forensic science, particularly for determining the time of deposition and understanding the effects of environmental conditions on evidence. This study presents an innovative bloodstains environment aging model, which integrates attenuated total reflection fourier transform infrared spectroscopy (ATR-FTIR) with a convolutional neural network (CNN) optimized using the black-winged kite algorithm. Bloodstains from common sources (pig, cow, and chicken) were analyzed under varying environmental conditions, including temperature fluctuations (0 °C, 40 °C, 100 °C) and simulated sunlight exposure, across multiple aging periods (1, 2, 4, 8 days). Spectral data obtained through ATR-FTIR scanning served as the input for the optimized CNN, enabling precise differentiation and classification of bloodstains based on aging and environmental factors. The model achieved high predictive accuracy, with 97.86 % for pig blood, 95.47 % for cow blood, and 97.15 % for chicken blood under 0 °C conditions, demonstrating its robustness and reliability in forensic applications. Additionally, this research highlights the potential for integrating spectroscopic data with advanced deep learning techniques to enhance forensic methodologies. By improving accuracy, accessibility, and cost-effectiveness, this work represents a significant advancement in bloodstain analysis and forensic science.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"268 ","pages":"Article 105576"},"PeriodicalIF":3.8,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145569656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning and evolutionary computation on e-nose datasets: A preliminary approach to ergot alkaloid detection in wheat 基于电子鼻数据集的机器学习和进化计算:小麦麦角生物碱检测的初步方法
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-11-13 DOI: 10.1016/j.chemolab.2025.105574
Chiara Giliberti , Giulia Magnani , Monica Mattarozzi , Marco Giannetto , Federica Bianchi , Maria Careri , Stefano Cagnoni
To the best of the authors' knowledge, this is the first time that an approach based on the use of machine learning (ML) algorithms combined with genetic programming (GP) was used to process small-sample-size e-nose data. The approach was proposed to classify the volatile compound information of wheat samples based on the contamination of ergot alkaloids, a class of emerging mycotoxins which pose a severe threat to food safety and consumer health. Unlike previous studies that applied convolutional neural networks to full e-nose response profiles, our approach focused on a small set of features extracted from the steady-state region of each response curve. Despite the low dimensionality, using GP to generate optimal features significantly improved the classification performance of several ML models. Different classifiers, including Decision Tree, Linear Discriminant Analysis, the Mahalanobis Distance Classifier, an artificial neural network-based method and ensemble methods were assessed and applied to a dataset of 21 wheat samples. These samples were classified according to their compliance with the EU maximum limit of 150 μg/kg for ergot alkaloids in wheat. The combined application of GP-based feature transformations, specifically using M3GP, and ML classifiers resulted in significant improvements in accuracy, F1 score, precision and recall compared to models trained on untransformed features. These findings highlight the unexplored potential of GP as a powerful tool for feature construction in sensor-based classification tasks for food safety signal processing.
据作者所知,这是第一次使用基于机器学习(ML)算法结合遗传编程(GP)的方法来处理小样本电子鼻数据。麦角生物碱是一类严重威胁食品安全和消费者健康的新型真菌毒素,提出了基于麦角生物碱污染对小麦样品挥发性化合物信息进行分类的方法。与之前将卷积神经网络应用于完整电子鼻响应剖面的研究不同,我们的方法侧重于从每个响应曲线的稳态区域提取的一小部分特征。尽管维数较低,但使用GP生成最优特征显著提高了几种ML模型的分类性能。采用决策树、线性判别分析、马氏距离分类器、基于人工神经网络的方法和集成方法对21个小麦样本数据集进行了评估和应用。这些样品符合欧盟对小麦中麦角生物碱的最高限量150 μg/kg进行分类。与未转换特征训练的模型相比,基于gp的特征转换(特别是使用M3GP)和ML分类器的组合应用在准确性、F1分数、精度和召回率方面都有显著提高。这些发现突出了GP作为基于传感器的食品安全信号处理分类任务中特征构建的强大工具的潜力。
{"title":"Machine learning and evolutionary computation on e-nose datasets: A preliminary approach to ergot alkaloid detection in wheat","authors":"Chiara Giliberti ,&nbsp;Giulia Magnani ,&nbsp;Monica Mattarozzi ,&nbsp;Marco Giannetto ,&nbsp;Federica Bianchi ,&nbsp;Maria Careri ,&nbsp;Stefano Cagnoni","doi":"10.1016/j.chemolab.2025.105574","DOIUrl":"10.1016/j.chemolab.2025.105574","url":null,"abstract":"<div><div>To the best of the authors' knowledge, this is the first time that an approach based on the use of machine learning (ML) algorithms combined with genetic programming (GP) was used to process small-sample-size e-nose data. The approach was proposed to classify the volatile compound information of wheat samples based on the contamination of ergot alkaloids, a class of emerging mycotoxins which pose a severe threat to food safety and consumer health. Unlike previous studies that applied convolutional neural networks to full e-nose response profiles, our approach focused on a small set of features extracted from the steady-state region of each response curve. Despite the low dimensionality, using GP to generate optimal features significantly improved the classification performance of several ML models. Different classifiers, including Decision Tree, Linear Discriminant Analysis, the Mahalanobis Distance Classifier, an artificial neural network-based method and ensemble methods were assessed and applied to a dataset of 21 wheat samples. These samples were classified according to their compliance with the EU maximum limit of 150 μg/kg for ergot alkaloids in wheat. The combined application of GP-based feature transformations, specifically using M3GP, and ML classifiers resulted in significant improvements in accuracy, F1 score, precision and recall compared to models trained on untransformed features. These findings highlight the unexplored potential of GP as a powerful tool for feature construction in sensor-based classification tasks for food safety signal processing.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"268 ","pages":"Article 105574"},"PeriodicalIF":3.8,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145569578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nondestructive detection of total flavonoids content in daylily using Vis-NIR and NIR hyperspectral imaging: data fusion combined with SHAP for model interpretability 利用近红外和近红外高光谱成像无损检测黄花菜中总黄酮含量:数据融合结合SHAP提高模型可解释性
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-11-13 DOI: 10.1016/j.chemolab.2025.105575
Xuexia Ma, Na Li, Ruifeng Wang, Jiaxue Ma, Ninghua Zhu, Tingting Li, Zhongxiong Zhang, Haifeng Li, Songlei Wang, Haihong Zhang
Flavonoids, vital bioactive compounds in daylily (a nutritionally and medicinally valuable food), have antioxidant, anti-inflammatory, antibacterial, and antidepressant properties, which boost its nutritional value, health benefits, and quality. Hyperspectral Imaging (HSI) for detecting trace flavonoids usually uses single systems, failing to leverage multispectral complementarity and restricting detection performance. This study integrates a data fusion strategy with two HSI techniques (visible–near-infrared (Vis-NIR) and near-infrared (NIR)) for the non-destructive detection of total flavonoids content (TFC) in daylily. The investigation employed partial least squares regression (PLSR) and least squares support vector machine (LS-SVM) for spectral data. Additionally, data-level and feature-level fusion strategies are implemented for data fusion modeling, while the SHapley Additive exPlanations (SHAP) methodology is used to comprehensively evaluate spectral feature contribution rates. The findings demonstrate that modeling based on the fusion strategy of LS-SVM yields substantially superior results compared to single-system approaches. Notably, the Mid-level fusion model incorporating competitive adaptive reweighted sampling (CARS) and LS-SVM demonstrates optimal performance. The determination coefficient (R2P), root mean square prediction error (RMSEP) and residual prediction deviation (RPD) of the prediction set were 0.9332, 0.0186 and 3.3560, respectively. This study confirms the feasibility of HSI technology in non-destructively detecting flavonoids in daylily. Furthermore, the collaborative optimization of multi-spectral HSI systems through a data fusion strategy effectively enhances the accuracy of non-destructive flavonoids detection. This study presents innovative technical approaches for non-destructive trace substance detection and agricultural product quality and safety monitoring, thereby providing essential technical support for developing intelligent agricultural product quality and safety monitoring systems.
黄花菜(一种有营养和药用价值的食物)中的黄酮类化合物是重要的生物活性化合物,具有抗氧化、抗炎、抗菌和抗抑郁的特性,这提高了黄花菜的营养价值、健康益处和质量。用于检测痕量黄酮类化合物的高光谱成像(HSI)通常使用单一系统,不能充分利用多光谱的互补性,限制了检测性能。本研究将两种HSI技术(可见-近红外(Vis-NIR)和近红外(NIR))的数据融合策略用于黄花菜中总黄酮含量(TFC)的无损检测。采用偏最小二乘回归(PLSR)和最小二乘支持向量机(LS-SVM)对光谱数据进行分析。此外,采用数据级和特征级融合策略进行数据融合建模,采用SHapley加性解释(SHAP)方法综合评估光谱特征贡献率。研究结果表明,与单系统方法相比,基于LS-SVM融合策略的建模结果明显优于单系统方法。值得注意的是,结合竞争自适应重加权采样(CARS)和LS-SVM的中级融合模型表现出最优的性能。预测集的决定系数(R2P)、均方根预测误差(RMSEP)和残差预测偏差(RPD)分别为0.9332、0.0186和3.3560。本研究证实了HSI技术无损检测黄花菜中黄酮类化合物的可行性。此外,通过数据融合策略对多光谱HSI系统进行协同优化,有效提高了黄酮类化合物无损检测的准确性。本研究提出了微量物质无损检测和农产品质量安全监测的创新技术途径,为发展农产品质量安全智能监测系统提供必要的技术支持。
{"title":"Nondestructive detection of total flavonoids content in daylily using Vis-NIR and NIR hyperspectral imaging: data fusion combined with SHAP for model interpretability","authors":"Xuexia Ma,&nbsp;Na Li,&nbsp;Ruifeng Wang,&nbsp;Jiaxue Ma,&nbsp;Ninghua Zhu,&nbsp;Tingting Li,&nbsp;Zhongxiong Zhang,&nbsp;Haifeng Li,&nbsp;Songlei Wang,&nbsp;Haihong Zhang","doi":"10.1016/j.chemolab.2025.105575","DOIUrl":"10.1016/j.chemolab.2025.105575","url":null,"abstract":"<div><div>Flavonoids, vital bioactive compounds in daylily (a nutritionally and medicinally valuable food), have antioxidant, anti-inflammatory, antibacterial, and antidepressant properties, which boost its nutritional value, health benefits, and quality. Hyperspectral Imaging (HSI) for detecting trace flavonoids usually uses single systems, failing to leverage multispectral complementarity and restricting detection performance. This study integrates a data fusion strategy with two HSI techniques (visible–near-infrared (Vis-NIR) and near-infrared (NIR)) for the non-destructive detection of total flavonoids content (TFC) in daylily. The investigation employed partial least squares regression (PLSR) and least squares support vector machine (LS-SVM) for spectral data. Additionally, data-level and feature-level fusion strategies are implemented for data fusion modeling, while the SHapley Additive exPlanations (SHAP) methodology is used to comprehensively evaluate spectral feature contribution rates. The findings demonstrate that modeling based on the fusion strategy of LS-SVM yields substantially superior results compared to single-system approaches. Notably, the Mid-level fusion model incorporating competitive adaptive reweighted sampling (CARS) and LS-SVM demonstrates optimal performance. The determination coefficient (R<sup>2</sup><sub>P</sub>), root mean square prediction error (RMSEP) and residual prediction deviation (RPD) of the prediction set were 0.9332, 0.0186 and 3.3560, respectively. This study confirms the feasibility of HSI technology in non-destructively detecting flavonoids in daylily. Furthermore, the collaborative optimization of multi-spectral HSI systems through a data fusion strategy effectively enhances the accuracy of non-destructive flavonoids detection. This study presents innovative technical approaches for non-destructive trace substance detection and agricultural product quality and safety monitoring, thereby providing essential technical support for developing intelligent agricultural product quality and safety monitoring systems.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"268 ","pages":"Article 105575"},"PeriodicalIF":3.8,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145569654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A combination of gas detection system and adaptive deep learning network (GFC-Net) to identify different production batches of beer 结合气体检测系统和自适应深度学习网络(GFC-Net)来识别不同生产批次的啤酒
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-11-04 DOI: 10.1016/j.chemolab.2025.105557
Junliang Han , Feifei Tong , Chuansheng Tang , Titi Liu
Even for products of the same brand, the quality of beer may vary across different production batches. Strict quality testing is essential to ensure product consistency, safety, and consumer satisfaction. In this work, an e-nose system, combined with the proposed deep learning algorithm, achieves the qualitative identification of beers from different production batches. First, the e-nose system is applied to acquire the gas information of beers from different production batches. Then, to comprehensively extract features characterizing the gas information, a fusion computational module that integrates local and global features from convolution and self-attention mechanism is proposed, called the Gas Features Calculation Module (GFCM). Finally, a Gas Features Classification Network (GFC-Net) is designed to enable the adaptive identification of beers from different production batches. Through structural optimization, ablation experiments, and comparison with state-of-the-art gas classification methods, GFC-Net achieves an accuracy of 98.50 %, a precision of 98.70 %, and a recall of 98.58 %. The integration of gas information that characterizes the overall chemical quality, along with GFC-Net, enables the qualitative identification of beers from different batches, providing an effective approach for quality monitoring.
即使是同一品牌的产品,不同批次的啤酒质量也会有所不同。严格的质量检测对于确保产品的一致性、安全性和消费者满意度至关重要。在这项工作中,电子鼻系统结合所提出的深度学习算法,实现了不同生产批次啤酒的定性识别。首先,利用电子鼻系统采集不同生产批次啤酒的气体信息。然后,为了全面提取表征气体信息的特征,提出了一种融合卷积和自关注机制的局部特征和全局特征的融合计算模块,称为气体特征计算模块(GFCM)。最后,设计了气体特征分类网络(GFC-Net),实现了不同生产批次啤酒的自适应识别。通过结构优化、烧蚀实验以及与现有气体分类方法的比较,GFC-Net的准确率为98.50%,精密度为98.70%,召回率为98.58%。整合表征整体化学质量的气体信息,以及GFC-Net,可以对不同批次的啤酒进行定性鉴定,为质量监测提供了有效的方法。
{"title":"A combination of gas detection system and adaptive deep learning network (GFC-Net) to identify different production batches of beer","authors":"Junliang Han ,&nbsp;Feifei Tong ,&nbsp;Chuansheng Tang ,&nbsp;Titi Liu","doi":"10.1016/j.chemolab.2025.105557","DOIUrl":"10.1016/j.chemolab.2025.105557","url":null,"abstract":"<div><div>Even for products of the same brand, the quality of beer may vary across different production batches. Strict quality testing is essential to ensure product consistency, safety, and consumer satisfaction. In this work, an e-nose system, combined with the proposed deep learning algorithm, achieves the qualitative identification of beers from different production batches. First, the e-nose system is applied to acquire the gas information of beers from different production batches. Then, to comprehensively extract features characterizing the gas information, a fusion computational module that integrates local and global features from convolution and self-attention mechanism is proposed, called the Gas Features Calculation Module (GFCM). Finally, a Gas Features Classification Network (GFC-Net) is designed to enable the adaptive identification of beers from different production batches. Through structural optimization, ablation experiments, and comparison with state-of-the-art gas classification methods, GFC-Net achieves an accuracy of 98.50 %, a precision of 98.70 %, and a recall of 98.58 %. The integration of gas information that characterizes the overall chemical quality, along with GFC-Net, enables the qualitative identification of beers from different batches, providing an effective approach for quality monitoring.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"268 ","pages":"Article 105557"},"PeriodicalIF":3.8,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145464515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sampling-based computation of the sets of feasible solutions and feasible bands for noisy data 基于采样的噪声数据可行解集和可行带的计算
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-11-04 DOI: 10.1016/j.chemolab.2025.105565
Mathias Sawall , Tomass Andersons , Chunhong Wei , Christoph Kubis , Klaus Neymeyr
Multivariate curve resolution often suffers from solution ambiguity, with many nonnegative factorizations fitting the data equally well. Building on the algorithm of Laursen and Hobolth (2022), we present an efficient sampling algorithm that can handle noisy data even containing negative entries. The algorithm iteratively updates factor columns via affine combinations within a nested loop structure, effectively approximating the sets of feasible solutions, the feasible bands, as well as the dual profiles. We apply the algorithm to two in situ FTIR spectroscopic data sets tracking the decomposition and activation of rhodium carbonyl complexes for the hydroformylation process. A comparison against established algorithms for these data sets indicates the robustness and computational efficiency of the algorithm.
多元曲线分辨率经常受到解模糊的影响,许多非负因子分解同样可以很好地拟合数据。基于Laursen和Hobolth(2022)的算法,我们提出了一种有效的采样算法,即使包含负项也可以处理噪声数据。该算法通过嵌套循环结构内的仿射组合迭代更新因子列,有效地逼近可行解集、可行带集以及双剖面集。我们将该算法应用于两个原位FTIR光谱数据集,跟踪氢甲酰化过程中铑羰基配合物的分解和活化。对这些数据集与已有算法的比较表明了该算法的鲁棒性和计算效率。
{"title":"Sampling-based computation of the sets of feasible solutions and feasible bands for noisy data","authors":"Mathias Sawall ,&nbsp;Tomass Andersons ,&nbsp;Chunhong Wei ,&nbsp;Christoph Kubis ,&nbsp;Klaus Neymeyr","doi":"10.1016/j.chemolab.2025.105565","DOIUrl":"10.1016/j.chemolab.2025.105565","url":null,"abstract":"<div><div>Multivariate curve resolution often suffers from solution ambiguity, with many nonnegative factorizations fitting the data equally well. Building on the algorithm of Laursen and Hobolth (2022), we present an efficient sampling algorithm that can handle noisy data even containing negative entries. The algorithm iteratively updates factor columns via affine combinations within a nested loop structure, effectively approximating the sets of feasible solutions, the feasible bands, as well as the dual profiles. We apply the algorithm to two <em>in situ</em> FTIR spectroscopic data sets tracking the decomposition and activation of rhodium carbonyl complexes for the hydroformylation process. A comparison against established algorithms for these data sets indicates the robustness and computational efficiency of the algorithm.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"268 ","pages":"Article 105565"},"PeriodicalIF":3.8,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145464519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigation on different strategies of significance testing in ANOVA-simultaneous component analysis (ASCA) anova -同步成分分析(ASCA)中不同显著性检验策略的探讨
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-11-04 DOI: 10.1016/j.chemolab.2025.105573
Faezeh Maddahi , Mahsa Akbari Lakeh , Jamile Mohammad Jafari , Farnoosh Koleini , Siewert Hugelier , Paul J. Gemperline , Hamid Abdollahi
ANOVA Simultaneous Component Analysis (ASCA) integrates analysis of variance with multivariate modelling to quantify how experimental factors and their interactions affect complex multivariate measurements. Statistical significance in ASCA is typically assessed by permutation testing; however, different permutation strategies imply distinct null hypothesis and exchangeability assumptions. In this study, we systematically compare three widely used approaches embedded in popular chemometric software packages where the permutation strategy is often predefined and not always transparent to the user. The restricted permutation method shuffles observations only within experimental strata, preserving the structure of the null hypothesis. The reduced‐model permutation contrasts the full ASCA model with a simplified version in which selected effects are removed. Permutation of marginal design matrices isolates interaction effects by permuting marginal matrices derived from the design matrix. We evaluate these methods on simulated datasets with varying patterns of main effects and interactions, as well as on an experimental study of feral cabbage (Brassica oleracea) under treatment and time factors. Our results show that the restricted permutation method reliably detects main effects, reduced‐model permutation excels at identifying interactions, and permutation of marginal design matrices consistently captures both. By examining the assumptions and performance of each method, we provide practical guidance for selecting the optimal permutation strategy in ASCA-based chemometric analysis, particularly for balanced experimental designs. As a baseline, we additionally assessed unrestricted permutation of the raw data using two test statistics: the sum of squares and the F-ratio. The results demonstrated that when employing the F-ratio, this approach was also capable of accurately detecting statistical significance.
ANOVA同时成分分析(ASCA)将方差分析与多变量建模相结合,量化实验因素及其相互作用如何影响复杂的多变量测量。ASCA的统计显著性通常通过排列测试来评估;然而,不同的排列策略意味着不同的零假设和互换性假设。在这项研究中,我们系统地比较了三种广泛使用的方法,这些方法嵌入在流行的化学计量软件包中,其中排列策略通常是预定义的,并不总是对用户透明。限制排列法只在实验层内打乱观察结果,保留原假设的结构。简化模型排列对比了完整的ASCA模型与简化版本,其中选择的影响被删除。边际设计矩阵的置换通过置换由设计矩阵导出的边际矩阵来隔离交互效应。我们在具有不同主效应和相互作用模式的模拟数据集上评估了这些方法,并在处理和时间因素下对野生卷心菜(芸苔甘蓝)进行了实验研究。我们的研究结果表明,限制排列方法可靠地检测主效应,简化模型排列在识别相互作用方面表现出色,而边缘设计矩阵的排列一致地捕获了两者。通过检查每种方法的假设和性能,我们为基于asca的化学计量分析中选择最佳排列策略提供了实用指导,特别是对于平衡实验设计。作为基线,我们使用两个检验统计量(平方和和f比)额外评估了原始数据的无限制排列。结果表明,当采用f比时,该方法也能够准确地检测统计显著性。
{"title":"Investigation on different strategies of significance testing in ANOVA-simultaneous component analysis (ASCA)","authors":"Faezeh Maddahi ,&nbsp;Mahsa Akbari Lakeh ,&nbsp;Jamile Mohammad Jafari ,&nbsp;Farnoosh Koleini ,&nbsp;Siewert Hugelier ,&nbsp;Paul J. Gemperline ,&nbsp;Hamid Abdollahi","doi":"10.1016/j.chemolab.2025.105573","DOIUrl":"10.1016/j.chemolab.2025.105573","url":null,"abstract":"<div><div>ANOVA Simultaneous Component Analysis (ASCA) integrates analysis of variance with multivariate modelling to quantify how experimental factors and their interactions affect complex multivariate measurements. Statistical significance in ASCA is typically assessed by permutation testing; however, different permutation strategies imply distinct null hypothesis and exchangeability assumptions. In this study, we systematically compare three widely used approaches embedded in popular chemometric software packages where the permutation strategy is often predefined and not always transparent to the user. The restricted permutation method shuffles observations only within experimental strata, preserving the structure of the null hypothesis. The reduced‐model permutation contrasts the full ASCA model with a simplified version in which selected effects are removed. Permutation of marginal design matrices isolates interaction effects by permuting marginal matrices derived from the design matrix. We evaluate these methods on simulated datasets with varying patterns of main effects and interactions, as well as on an experimental study of feral cabbage (Brassica oleracea) under treatment and time factors. Our results show that the restricted permutation method reliably detects main effects, reduced‐model permutation excels at identifying interactions, and permutation of marginal design matrices consistently captures both. By examining the assumptions and performance of each method, we provide practical guidance for selecting the optimal permutation strategy in ASCA-based chemometric analysis, particularly for balanced experimental designs. As a baseline, we additionally assessed unrestricted permutation of the raw data using two test statistics: the sum of squares and the F-ratio. The results demonstrated that when employing the F-ratio, this approach was also capable of accurately detecting statistical significance.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"268 ","pages":"Article 105573"},"PeriodicalIF":3.8,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145464517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Impact of converting graphs into spanning trees on node and graph classification in Graph Neural Network 图神经网络中生成树对节点和图分类的影响
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-11-03 DOI: 10.1016/j.chemolab.2025.105562
Mohammadmahdi Taheri , Mahdi Eftekhari , Gholamreza Aghamollaei
This paper investigates the impact of graph reduction to their spanning trees on Graph Neural Network (GNN) performance in node and graph classification across six GNN architectures. The proposed approach leverages spanning trees for graph sparsification while preserving critical structural information, achieving performance comparable to or better than full-graph models. A theoretical connection is established between edge sampling via determinantal point processes (DPPs) and the transfer-current matrix, showing minimum spanning trees effectively approximate DPP-selected subgraphs. Four edge-weighting schemes are analyzed such that their sparsification trade-offs are revealed. The method consistently reduces memory usage and computation while maintaining accuracy. Findings indicate that spanning tree pruning offers a scalable, theoretically grounded strategy for efficient GNN training without compromising classification accuracy. Experiments on node classification benchmarks (Cora, Citeseer, PubMed, PPI) and graph classification biological and chemical datasets (AIDS, MUTAG, PROTEINS, NCI1, IMDB-BINARY) demonstrate excellent graph classification results, notably 98.27% accuracy on AIDS, with reduced computational overhead.
本文研究了生成树的图约简对图神经网络(GNN)节点和图分类性能的影响。所提出的方法利用生成树进行图稀疏化,同时保留关键的结构信息,实现与全图模型相当或更好的性能。通过确定性点过程(DPPs)和传输电流矩阵建立了边缘采样的理论联系,表明最小生成树有效地近似dpp选择的子图。分析了四种边加权方案,从而揭示了它们的稀疏性权衡。该方法在保持准确性的同时持续减少内存使用和计算。研究结果表明,生成树修剪为有效的GNN训练提供了一种可扩展的,理论基础的策略,而不会影响分类精度。在节点分类基准(Cora, Citeseer, PubMed, PPI)和图分类生物和化学数据集(AIDS, MUTAG, PROTEINS, NCI1, IMDB-BINARY)上的实验显示了出色的图分类结果,在AIDS上准确率达到98.27%,计算开销降低。
{"title":"Impact of converting graphs into spanning trees on node and graph classification in Graph Neural Network","authors":"Mohammadmahdi Taheri ,&nbsp;Mahdi Eftekhari ,&nbsp;Gholamreza Aghamollaei","doi":"10.1016/j.chemolab.2025.105562","DOIUrl":"10.1016/j.chemolab.2025.105562","url":null,"abstract":"<div><div>This paper investigates the impact of graph reduction to their spanning trees on Graph Neural Network (GNN) performance in node and graph classification across six GNN architectures. The proposed approach leverages spanning trees for graph sparsification while preserving critical structural information, achieving performance comparable to or better than full-graph models. A theoretical connection is established between edge sampling via determinantal point processes (DPPs) and the transfer-current matrix, showing minimum spanning trees effectively approximate DPP-selected subgraphs. Four edge-weighting schemes are analyzed such that their sparsification trade-offs are revealed. The method consistently reduces memory usage and computation while maintaining accuracy. Findings indicate that spanning tree pruning offers a scalable, theoretically grounded strategy for efficient GNN training without compromising classification accuracy. Experiments on node classification benchmarks (Cora, Citeseer, PubMed, PPI) and graph classification biological and chemical datasets (AIDS, MUTAG, PROTEINS, NCI1, IMDB-BINARY) demonstrate excellent graph classification results, notably 98.27% accuracy on AIDS, with reduced computational overhead.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"268 ","pages":"Article 105562"},"PeriodicalIF":3.8,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145464516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Chemometrics and Intelligent Laboratory Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1