首页 > 最新文献

Chemometrics and Intelligent Laboratory Systems最新文献

英文 中文
Enhancing IoT anomaly detection with the Dwarf Mongoose-Chaos optimized deep belief framework 利用矮猫鼬-混沌优化的深度信念框架增强物联网异常检测
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-01-15 Epub Date: 2025-10-30 DOI: 10.1016/j.chemolab.2025.105558
Veena Potdar , Mohan Govindasa Kabadi
Anomaly detection is essential for identifying deviations from normal patterns in data, enabling the detection of security breaches or system faults, particularly in Internet of Things (IoT) networks. However, traditional machine learning (ML) and deep learning (DL) methods often struggle with the dynamic and complex nature of IoT environments, where attack patterns are non-linear, continuously evolving, and context-dependent. These models typically require large labeled datasets and retraining to adapt to new threats, which limits their responsiveness and scalability. Additionally, their high computational demands make real-time deployment on resource-constrained IoT devices challenging. Furthermore, many ML/DL models exhibit poor generalization, performing well in controlled scenarios but failing to maintain accuracy across diverse, real-world IoT settings with varying devices, protocols, and data distributions. To address these issues, this work proposes the Dwarf Mongoose-Chaos Optimized Deep Belief (DCODB) Framework, which combines advanced preprocessing, feature selection (FS), and classification techniques. Initial preprocessing involves Min-Max Normalization and One-Hot Encoding to scale numerical features and transform categorical data for effective model input. FS is optimized by the novel Dwarf Mongoose-Chaos Fusion Optimization (DMCFO), which is a swarm intelligence algorithm that leverages chaotic maps to improve the effectiveness of the Dwarf Mongoose Optimization Algorithm (DMO), reducing dimensionality and improving classification accuracy. The refined features are then classified using a Deep Belief Network (DBN), which processes hierarchical feature representations to differentiate between normal and anomalous behaviors in the NSL-KDD dataset. The proposed framework has been thoroughly assessed using diverse metrics, demonstrating its effectiveness in anomaly detection by achieving above 99 % Balanced Accuracy, along with exceptional Precision, Recall, F1 Score, Specificity, and the AUC-ROC curve. These high-performance metrics affirm the model's capability to deliver reliable and scalable anomaly detection in IoT environments, strengthening overall security.
异常检测对于识别数据中正常模式的偏差至关重要,能够检测安全漏洞或系统故障,特别是在物联网(IoT)网络中。然而,传统的机器学习(ML)和深度学习(DL)方法经常与物联网环境的动态性和复杂性作斗争,其中攻击模式是非线性的,不断发展的,并且依赖于上下文。这些模型通常需要大型标记数据集和重新训练以适应新的威胁,这限制了它们的响应能力和可扩展性。此外,它们的高计算需求使得在资源受限的物联网设备上进行实时部署具有挑战性。此外,许多ML/DL模型表现出较差的泛化,在受控场景中表现良好,但无法在具有不同设备,协议和数据分布的各种现实世界物联网设置中保持准确性。为了解决这些问题,本工作提出了矮猫鼬-混沌优化深度信念(DCODB)框架,该框架结合了先进的预处理、特征选择(FS)和分类技术。初始预处理包括Min-Max归一化和One-Hot编码,以缩放数值特征并将分类数据转换为有效的模型输入。该算法是一种基于混沌映射的群智能算法,利用混沌映射来提高小猫鼬优化算法(DMO)的有效性,降低了分类维数,提高了分类精度。然后使用深度信念网络(DBN)对精炼的特征进行分类,DBN处理分层特征表示以区分NSL-KDD数据集中的正常和异常行为。所提出的框架已经使用不同的指标进行了彻底的评估,证明了其在异常检测方面的有效性,达到了99%以上的平衡准确率,以及出色的精度、召回率、F1评分、特异性和AUC-ROC曲线。这些高性能指标肯定了该模型在物联网环境中提供可靠和可扩展异常检测的能力,从而增强了整体安全性。
{"title":"Enhancing IoT anomaly detection with the Dwarf Mongoose-Chaos optimized deep belief framework","authors":"Veena Potdar ,&nbsp;Mohan Govindasa Kabadi","doi":"10.1016/j.chemolab.2025.105558","DOIUrl":"10.1016/j.chemolab.2025.105558","url":null,"abstract":"<div><div>Anomaly detection is essential for identifying deviations from normal patterns in data, enabling the detection of security breaches or system faults, particularly in Internet of Things (IoT) networks. However, traditional machine learning (ML) and deep learning (DL) methods often struggle with the dynamic and complex nature of IoT environments, where attack patterns are non-linear, continuously evolving, and context-dependent. These models typically require large labeled datasets and retraining to adapt to new threats, which limits their responsiveness and scalability. Additionally, their high computational demands make real-time deployment on resource-constrained IoT devices challenging. Furthermore, many ML/DL models exhibit poor generalization, performing well in controlled scenarios but failing to maintain accuracy across diverse, real-world IoT settings with varying devices, protocols, and data distributions. To address these issues, this work proposes the Dwarf Mongoose-Chaos Optimized Deep Belief (DCODB) Framework, which combines advanced preprocessing, feature selection (FS), and classification techniques. Initial preprocessing involves Min-Max Normalization and One-Hot Encoding to scale numerical features and transform categorical data for effective model input. FS is optimized by the novel Dwarf Mongoose-Chaos Fusion Optimization (DMCFO), which is a swarm intelligence algorithm that leverages chaotic maps to improve the effectiveness of the Dwarf Mongoose Optimization Algorithm (DMO), reducing dimensionality and improving classification accuracy. The refined features are then classified using a Deep Belief Network (DBN), which processes hierarchical feature representations to differentiate between normal and anomalous behaviors in the NSL-KDD dataset. The proposed framework has been thoroughly assessed using diverse metrics, demonstrating its effectiveness in anomaly detection by achieving above 99 % Balanced Accuracy, along with exceptional Precision, Recall, F1 Score, Specificity, and the AUC-ROC curve. These high-performance metrics affirm the model's capability to deliver reliable and scalable anomaly detection in IoT environments, strengthening overall security.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"268 ","pages":"Article 105558"},"PeriodicalIF":3.8,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145464518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sampling-based computation of the sets of feasible solutions and feasible bands for noisy data 基于采样的噪声数据可行解集和可行带的计算
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-01-15 Epub Date: 2025-11-04 DOI: 10.1016/j.chemolab.2025.105565
Mathias Sawall , Tomass Andersons , Chunhong Wei , Christoph Kubis , Klaus Neymeyr
Multivariate curve resolution often suffers from solution ambiguity, with many nonnegative factorizations fitting the data equally well. Building on the algorithm of Laursen and Hobolth (2022), we present an efficient sampling algorithm that can handle noisy data even containing negative entries. The algorithm iteratively updates factor columns via affine combinations within a nested loop structure, effectively approximating the sets of feasible solutions, the feasible bands, as well as the dual profiles. We apply the algorithm to two in situ FTIR spectroscopic data sets tracking the decomposition and activation of rhodium carbonyl complexes for the hydroformylation process. A comparison against established algorithms for these data sets indicates the robustness and computational efficiency of the algorithm.
多元曲线分辨率经常受到解模糊的影响,许多非负因子分解同样可以很好地拟合数据。基于Laursen和Hobolth(2022)的算法,我们提出了一种有效的采样算法,即使包含负项也可以处理噪声数据。该算法通过嵌套循环结构内的仿射组合迭代更新因子列,有效地逼近可行解集、可行带集以及双剖面集。我们将该算法应用于两个原位FTIR光谱数据集,跟踪氢甲酰化过程中铑羰基配合物的分解和活化。对这些数据集与已有算法的比较表明了该算法的鲁棒性和计算效率。
{"title":"Sampling-based computation of the sets of feasible solutions and feasible bands for noisy data","authors":"Mathias Sawall ,&nbsp;Tomass Andersons ,&nbsp;Chunhong Wei ,&nbsp;Christoph Kubis ,&nbsp;Klaus Neymeyr","doi":"10.1016/j.chemolab.2025.105565","DOIUrl":"10.1016/j.chemolab.2025.105565","url":null,"abstract":"<div><div>Multivariate curve resolution often suffers from solution ambiguity, with many nonnegative factorizations fitting the data equally well. Building on the algorithm of Laursen and Hobolth (2022), we present an efficient sampling algorithm that can handle noisy data even containing negative entries. The algorithm iteratively updates factor columns via affine combinations within a nested loop structure, effectively approximating the sets of feasible solutions, the feasible bands, as well as the dual profiles. We apply the algorithm to two <em>in situ</em> FTIR spectroscopic data sets tracking the decomposition and activation of rhodium carbonyl complexes for the hydroformylation process. A comparison against established algorithms for these data sets indicates the robustness and computational efficiency of the algorithm.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"268 ","pages":"Article 105565"},"PeriodicalIF":3.8,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145464519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparison of NIR and Raman spectroscopy for determining used cooking oil properties using chemometric methods 化学计量法测定用过食用油性质的近红外光谱与拉曼光谱的比较
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-01-15 Epub Date: 2025-10-21 DOI: 10.1016/j.chemolab.2025.105552
Ivana Hradecká , Kateřina Svobodová , Aleš Vráblík , Vladimír Hönig
This study compares the performance of near-infrared (NIR) and Raman spectroscopy in the quantitative analysis of used cooking oil (UCO), focusing on critical parameters such as acid value, density, and kinematic viscosity. Monitoring these properties ensures that the feedstock meets the necessary specifications for optimal biofuel production, contributing to the sustainability and performance of the final product. NIR and Raman spectroscopy offers significant advantages by enabling rapid, real-time and non-destructive measurements of several properties at once.
Partial least squares (PLS) was employed, enabling the correlation between reference results and spectral information obtained by NIR and Raman spectroscopy. NIR spectroscopy demonstrated superior performance compared to Raman spectroscopy in analyzing UCO properties. Results revealed the better performance of NIR spectroscopy for the measurement of acid value (R2P = 0.99, RMSEP = 0.087 mg KOH g⁻¹, RPD = 8.12), and kinematic viscosity at 40 °C (R2P = 0.97, RMSEP = 0.325 mm²/s, and RPD = 5.20). Raman spectroscopy was pointed out as the most suitable for the determination of density at 15 °C (R2P = 0.97, RMSEP = 0.167 kg m⁻³, RPD = 4.20). However, both techniques presented excellent results and are suitable for the accurate determination of UCO propreties.
本研究比较了近红外(NIR)和拉曼光谱在废油(UCO)定量分析中的性能,重点关注酸值、密度和运动粘度等关键参数。监测这些特性可确保原料符合最佳生物燃料生产的必要规格,有助于最终产品的可持续性和性能。近红外和拉曼光谱具有显著的优势,可以一次对几种特性进行快速、实时和非破坏性的测量。采用偏最小二乘法(PLS),使参考结果与近红外光谱和拉曼光谱获得的光谱信息相互关联。与拉曼光谱相比,近红外光谱在分析UCO性质方面表现出优越的性能。结果表明,在40°C时,近红外光谱法能较好地测定酸值(R2P = 0.99, RMSEP = 0.087 mg KOH g⁻¹,RPD = 8.12)和运动粘度(R2P = 0.97, RMSEP = 0.325 mm²/s, RPD = 5.20)。指出拉曼光谱法最适用于15°C时的密度测定(R2P = 0.97, RMSEP = 0.167 kg m⁻³,RPD = 4.20)。然而,两种方法都取得了很好的结果,适合于精确测定UCO的性质。
{"title":"Comparison of NIR and Raman spectroscopy for determining used cooking oil properties using chemometric methods","authors":"Ivana Hradecká ,&nbsp;Kateřina Svobodová ,&nbsp;Aleš Vráblík ,&nbsp;Vladimír Hönig","doi":"10.1016/j.chemolab.2025.105552","DOIUrl":"10.1016/j.chemolab.2025.105552","url":null,"abstract":"<div><div>This study compares the performance of near-infrared (NIR) and Raman spectroscopy in the quantitative analysis of used cooking oil (UCO), focusing on critical parameters such as acid value, density, and kinematic viscosity. Monitoring these properties ensures that the feedstock meets the necessary specifications for optimal biofuel production, contributing to the sustainability and performance of the final product. NIR and Raman spectroscopy offers significant advantages by enabling rapid, real-time and non-destructive measurements of several properties at once.</div><div>Partial least squares (PLS) was employed, enabling the correlation between reference results and spectral information obtained by NIR and Raman spectroscopy. NIR spectroscopy demonstrated superior performance compared to Raman spectroscopy in analyzing UCO properties. Results revealed the better performance of NIR spectroscopy for the measurement of acid value (R<sup>2</sup><sub>P</sub> = 0.99, RMSEP = 0.087 mg KOH g⁻¹, RPD = 8.12), and kinematic viscosity at 40 °C (R<sup>2</sup><sub>P</sub> = 0.97, RMSEP = 0.325 mm²/s, and RPD = 5.20). Raman spectroscopy was pointed out as the most suitable for the determination of density at 15 °C (R<sup>2</sup><sub>P</sub> = 0.97, RMSEP = 0.167 kg m⁻³, RPD = 4.20). However, both techniques presented excellent results and are suitable for the accurate determination of UCO propreties.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"268 ","pages":"Article 105552"},"PeriodicalIF":3.8,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145464511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Marginal region-integrated regressive conditional variational autoencoder-generative adversarial network: A soft sensing enhancement method 边缘区域积分回归条件变分自编码器生成对抗网络:一种软感知增强方法
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-01-15 Epub Date: 2025-11-19 DOI: 10.1016/j.chemolab.2025.105577
Guo-yu Liu , Qun-Xiong Zhu , Yi Luo , Wei Ke , Yan-Lin He , Yang Zhang , Ming-Qing Zhang , Yuan Xu
In industrial processes, due to limitations of actual industrial production, many industrial data are difficult to obtain directly, which limits sample size and leads to uneven data distribution, ultimately affecting the fitting performance of soft sensing models. To address this challenge, we propose a marginal Isolation Mega Trend Diffusion with Regressor Conditional Variational Autoencoder-Generative Adversarial Network (IRCVGAN). designed to improve model accuracy by expanding the sample size. Specifically, the proposed method first applies the isolation forest algorithm to detect sparse marginal regions in the dataset, followed by Mega Trend Diffusion (MTD) to broaden the range of input data by generating virtual samples, thus increasing dataset diversity. Next, an improved regressive conditional Variational Autoencoder-Generative Adversarial Network (RCVAEGAN) is developed to perform fine-grained selection on the virtual samples generated by MTD. Furthermore, the mapping between input variables and production quality indicators is embedded in RCVAEGAN, enhancing the representativeness of the samples and improving the model’s fitting accuracy, the effectiveness of our proposed method is validated through function fitting tests and real-world industrial data from a purified terephthalic acid (PTA) solvent system.
在工业过程中,由于实际工业生产的限制,许多工业数据难以直接获取,这限制了样本量,导致数据分布不均匀,最终影响了软测量模型的拟合性能。为了解决这一挑战,我们提出了一种边缘隔离大趋势扩散回归条件变分自编码器生成对抗网络(IRCVGAN)。旨在通过扩大样本量来提高模型的准确性。具体而言,该方法首先采用隔离森林算法检测数据集中的稀疏边缘区域,然后采用大趋势扩散(MTD)算法通过生成虚拟样本来扩大输入数据的范围,从而增加数据集的多样性。然后,开发了一种改进的回归条件变分自编码器生成对抗网络(RCVAEGAN),对MTD生成的虚拟样本进行细粒度选择。此外,在RCVAEGAN中嵌入了输入变量与生产质量指标之间的映射关系,增强了样本的代表性,提高了模型的拟合精度,并通过函数拟合测试和纯化对苯二甲酸(PTA)溶剂体系的实际工业数据验证了我们所提出方法的有效性。
{"title":"Marginal region-integrated regressive conditional variational autoencoder-generative adversarial network: A soft sensing enhancement method","authors":"Guo-yu Liu ,&nbsp;Qun-Xiong Zhu ,&nbsp;Yi Luo ,&nbsp;Wei Ke ,&nbsp;Yan-Lin He ,&nbsp;Yang Zhang ,&nbsp;Ming-Qing Zhang ,&nbsp;Yuan Xu","doi":"10.1016/j.chemolab.2025.105577","DOIUrl":"10.1016/j.chemolab.2025.105577","url":null,"abstract":"<div><div>In industrial processes, due to limitations of actual industrial production, many industrial data are difficult to obtain directly, which limits sample size and leads to uneven data distribution, ultimately affecting the fitting performance of soft sensing models. To address this challenge, we propose a marginal Isolation Mega Trend Diffusion with Regressor Conditional Variational Autoencoder-Generative Adversarial Network (IRCVGAN). designed to improve model accuracy by expanding the sample size. Specifically, the proposed method first applies the isolation forest algorithm to detect sparse marginal regions in the dataset, followed by Mega Trend Diffusion (MTD) to broaden the range of input data by generating virtual samples, thus increasing dataset diversity. Next, an improved regressive conditional Variational Autoencoder-Generative Adversarial Network (RCVAEGAN) is developed to perform fine-grained selection on the virtual samples generated by MTD. Furthermore, the mapping between input variables and production quality indicators is embedded in RCVAEGAN, enhancing the representativeness of the samples and improving the model’s fitting accuracy, the effectiveness of our proposed method is validated through function fitting tests and real-world industrial data from a purified terephthalic acid (PTA) solvent system.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"268 ","pages":"Article 105577"},"PeriodicalIF":3.8,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145615446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimization of caper bud drying using the DT_LSBOOST model: A predictive approach to improve quality and efficiency 利用DT_LSBOOST模型优化刺山柑芽干燥:提高质量和效率的预测方法
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-01-15 Epub Date: 2025-11-20 DOI: 10.1016/j.chemolab.2025.105585
Chafika Lakhdari , Hocine Remini , Samia Djellal , Meriem Adouane , Hichem Tahraoui , Abdeltif Amrane , Farid Dahmoune , Merve Yavuz-Düzgün , Elif Feyza Aydar , Evren Demircan , Zehra Mertdinç , Beraat Ozçelik , Nabil Kadri
Capparis spinosa L. buds undergo salting and drying to enhance their shelf life and organoleptic properties. This study evaluates the impact of four drying methods: oven drying (OD), vacuum drying (VD), freeze-drying (FD), and microwave drying (MD) on the physicochemical, antioxidant, and microbiological properties of dried caper buds. Salting reduced the initial moisture content from 508.50 % to 168.59 % (db), while drying further decreased it to approximately 9 %. Drying time varied significantly, with MD achieving the shortest duration (0.19–0.75h) and OD requiring the longest (reaching 49.66h). FD exhibited the highest energy consumption (60.77 kWh/kg), followed by VD, while OD and MD were the least energy-intensive (0.54–3.10 kWh/kg and 1.34–2.18 kWh/kg, respectively). FD preserved the most chlorophyll (193.63 μg/g DW) and total phenolic content (28.98 mgGAE/g DW), whereas MD at 200 W resulted in the lowest TPC (9.88 mgGAE/g DW). FD samples also showed superior antioxidant activities in both ABTS and FRAP assays. In contrast, OD and MD increased browning and degraded quality attributes. Multivariate analyses (PCA and clustering) highlighted FD as optimal for preserving quality, while MD was the most detrimental. Microbiological analysis confirmed that dried capers met food safety standards. A predictive model using Decision Tree coupled with Least Squares Boosting (DT_LSBOOST) achieved exceptional accuracy (R = 0.9999, RMSE = 0.0564, ESP = 0.2028, MAE = 0.0305), providing a reliable tool for optimizing drying parameters. Overall, freeze-drying emerged as the best method to retain nutritional and bioactive properties of capers, and the developed predictive model offers an innovative approach to enhancing caper processing efficiency.
辣椒芽经过腌制和干燥,以提高其保质期和感官特性。研究了四种干燥方法:烘箱干燥(OD)、真空干燥(VD)、冷冻干燥(FD)和微波干燥(MD)对干刺山柑芽的理化、抗氧化和微生物特性的影响。盐渍使其初始含水率从508.50%降低到168.59% (db),干燥使其进一步降低到约9%。干燥时间变化明显,MD最短(0.19-0.75h), OD最长(49.66h)。FD的能耗最高(60.77 kWh/kg), VD次之,OD和MD的能耗最低(分别为0.54 ~ 3.10 kWh/kg和1.34 ~ 2.18 kWh/kg)。FD保存了最多的叶绿素(193.63 μg/g DW)和总酚含量(28.98 mgGAE/g DW),而200 W的MD保存了最低的TPC (9.88 mgGAE/g DW)。FD样品在ABTS和FRAP检测中也显示出优异的抗氧化活性。相反,OD和MD增加了褐变和劣化的品质属性。多变量分析(PCA和聚类)强调FD对于保持质量是最佳的,而MD是最有害的。微生物分析证实,干刺山柑符合食品安全标准。采用决策树与最小二乘提升(DT_LSBOOST)相结合的预测模型取得了优异的准确性(R = 0.9999, RMSE = 0.0564, ESP = 0.2028, MAE = 0.0305),为优化干燥参数提供了可靠的工具。综上所述,冷冻干燥是保持酸豆营养和生物活性的最佳方法,所建立的预测模型为提高酸豆加工效率提供了一种创新的方法。
{"title":"Optimization of caper bud drying using the DT_LSBOOST model: A predictive approach to improve quality and efficiency","authors":"Chafika Lakhdari ,&nbsp;Hocine Remini ,&nbsp;Samia Djellal ,&nbsp;Meriem Adouane ,&nbsp;Hichem Tahraoui ,&nbsp;Abdeltif Amrane ,&nbsp;Farid Dahmoune ,&nbsp;Merve Yavuz-Düzgün ,&nbsp;Elif Feyza Aydar ,&nbsp;Evren Demircan ,&nbsp;Zehra Mertdinç ,&nbsp;Beraat Ozçelik ,&nbsp;Nabil Kadri","doi":"10.1016/j.chemolab.2025.105585","DOIUrl":"10.1016/j.chemolab.2025.105585","url":null,"abstract":"<div><div><em>Capparis spinosa</em> L. buds undergo salting and drying to enhance their shelf life and organoleptic properties. This study evaluates the impact of four drying methods: oven drying (OD), vacuum drying (VD), freeze-drying (FD), and microwave drying (MD) on the physicochemical, antioxidant, and microbiological properties of dried caper buds. Salting reduced the initial moisture content from 508.50 % to 168.59 % (db), while drying further decreased it to approximately 9 %. Drying time varied significantly, with MD achieving the shortest duration (0.19–0.75h) and OD requiring the longest (reaching 49.66h). FD exhibited the highest energy consumption (60.77 kWh/kg), followed by VD, while OD and MD were the least energy-intensive (0.54–3.10 kWh/kg and 1.34–2.18 kWh/kg, respectively). FD preserved the most chlorophyll (193.63 μg/g DW) and total phenolic content (28.98 mgGAE/g DW), whereas MD at 200 W resulted in the lowest TPC (9.88 mgGAE/g DW). FD samples also showed superior antioxidant activities in both ABTS and FRAP assays. In contrast, OD and MD increased browning and degraded quality attributes. Multivariate analyses (PCA and clustering) highlighted FD as optimal for preserving quality, while MD was the most detrimental. Microbiological analysis confirmed that dried capers met food safety standards. A predictive model using Decision Tree coupled with Least Squares Boosting (DT_LSBOOST) achieved exceptional accuracy (R = 0.9999, RMSE = 0.0564, ESP = 0.2028, MAE = 0.0305), providing a reliable tool for optimizing drying parameters. Overall, freeze-drying emerged as the best method to retain nutritional and bioactive properties of capers, and the developed predictive model offers an innovative approach to enhancing caper processing efficiency.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"268 ","pages":"Article 105585"},"PeriodicalIF":3.8,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145615444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling and statistical analysis of cancer drugs using M-polynomial indices for their characteristics 肿瘤药物特性的m -多项式指标建模与统计分析
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-01-15 Epub Date: 2025-11-24 DOI: 10.1016/j.chemolab.2025.105590
Qasem M. Tawhari , Muhammad Naeem , Saba Maqbool , Syed Muhammad Kashif Raza , Adnan Aslam
This study computes M-polynomial indices for Doxorubicin and Mitoxantrone, two widely used anthracycline and anthracenedione anticancer drugs, respectively. Doxorubicin, a potent topoisomerase II inhibitor, is commonly employed in treating various cancers, including breast, ovarian, and leukemia. Mitoxantrone, with its unique DNA-intercalating properties, is effective against acute myeloid leukemia, breast cancer, and non-Hodgkin’s lymphoma. We produced M-polynomial indices by partitioning graph edges depending on degree and adjacency matrix. A Python algorithm is written using an adjacency matrix to efficiently compute the indices, reducing calculation time from days to minutes and eliminating human error. Simple linear regression models in SPSS software are used to create QSPR and predict the physical attributes of cancer medicines. Our findings show that M-polynomial indices accurately predict physical attributes, providing important insights into the structural requirements for maximum anticancer action. In addition, we proposed models for each physical attribute. This study aids in the development of new cancer therapies and the prediction of physical features for uncharacterized medications.
本研究分别计算了阿霉素和米托蒽醌这两种常用的蒽环类和蒽二酮类抗癌药的m -多项式指标。阿霉素是一种有效的拓扑异构酶II抑制剂,通常用于治疗各种癌症,包括乳腺癌、卵巢癌和白血病。米托蒽醌具有独特的dna嵌入特性,对急性髓性白血病、乳腺癌和非霍奇金淋巴瘤有效。我们根据度和邻接矩阵对图边进行划分,得到了m个多项式索引。Python算法使用邻接矩阵编写,以有效地计算索引,将计算时间从几天减少到几分钟,并消除了人为错误。使用SPSS软件中的简单线性回归模型创建QSPR并预测癌症药物的物理属性。我们的研究结果表明,m -多项式指数准确地预测了物理属性,为最大抗癌作用的结构要求提供了重要见解。此外,我们提出了每个物理属性的模型。这项研究有助于开发新的癌症治疗方法和预测未表征药物的物理特征。
{"title":"Modeling and statistical analysis of cancer drugs using M-polynomial indices for their characteristics","authors":"Qasem M. Tawhari ,&nbsp;Muhammad Naeem ,&nbsp;Saba Maqbool ,&nbsp;Syed Muhammad Kashif Raza ,&nbsp;Adnan Aslam","doi":"10.1016/j.chemolab.2025.105590","DOIUrl":"10.1016/j.chemolab.2025.105590","url":null,"abstract":"<div><div>This study computes M-polynomial indices for Doxorubicin and Mitoxantrone, two widely used anthracycline and anthracenedione anticancer drugs, respectively. Doxorubicin, a potent topoisomerase II inhibitor, is commonly employed in treating various cancers, including breast, ovarian, and leukemia. Mitoxantrone, with its unique DNA-intercalating properties, is effective against acute myeloid leukemia, breast cancer, and non-Hodgkin’s lymphoma. We produced M-polynomial indices by partitioning graph edges depending on degree and adjacency matrix. A Python algorithm is written using an adjacency matrix to efficiently compute the indices, reducing calculation time from days to minutes and eliminating human error. Simple linear regression models in SPSS software are used to create QSPR and predict the physical attributes of cancer medicines. Our findings show that M-polynomial indices accurately predict physical attributes, providing important insights into the structural requirements for maximum anticancer action. In addition, we proposed models for each physical attribute. This study aids in the development of new cancer therapies and the prediction of physical features for uncharacterized medications.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"268 ","pages":"Article 105590"},"PeriodicalIF":3.8,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145615443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High precision classification method for black tea: Deep learning combined with two-dimensional correlation spectroscopy 红茶高精度分类方法:深度学习与二维相关光谱相结合
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-01-15 Epub Date: 2025-11-20 DOI: 10.1016/j.chemolab.2025.105580
Long Liu , Yifan Wang , Bin Wang , Xiaoxuan Xu , Jing Xu
Tea is a widely popular beverage across the globe. However, its medicinal content and value vary from one species to another. As a result, consumers need a quick and efficient method to distinguish between species. This paper introduces a method for species classification using two-dimensional correlation spectroscopy (2DCOS) images combined with deep learning (DL) models. Initially, 345 thin-section samples of five different black teas were prepared, and their near-infrared spectroscopy (NIRS) data were obtained. From this preprocessed one-dimensional NIRS data, 8280 2DCOS contour images and contour fill images were generated. MobileNet model with various bottleneck residual blocks was constructed, and trained using these 2DCOS images as samples, achieving a classification accuracy of 100 %. The model testing results indicated that the optimal NIRS data preprocessing method and 2DCOS image format are Standard Normal Variate transformation (SNV) and contour fill image. Furthermore, the classification results of one-dimensional NIRS data, 2DOCS matrix data, and 2DCOS image data were compared, showing that the 2DCOS images provide higher classification accuracy. Finally, comparative experiments were conducted between the MobileNet model and other deep learning models, demonstrating that the MobileNet model has the advantages of fewer parameters, lower computational load, high accuracy, and fast convergence speed. Therefore, combining 2DCOS images with the MobileNet model for black tea classification is effective. This paper offers a promising approach for the identification of black tea species, with extensive potential applications in species classification.
茶是一种在全球范围内广受欢迎的饮料。然而,其药用成分和价值因物种而异。因此,消费者需要一种快速有效的方法来区分物种。介绍了一种基于二维相关光谱(2DCOS)图像与深度学习(DL)模型相结合的物种分类方法。首先,制备了5种不同红茶的345个薄片样品,并获得了它们的近红外光谱(NIRS)数据。利用预处理后的一维近红外光谱数据,生成8280幅2DCOS轮廓图像和轮廓填充图像。构建具有不同瓶颈残差块的MobileNet模型,并以这些2DCOS图像为样本进行训练,分类准确率达到100%。模型测试结果表明,最优的近红外光谱数据预处理方法和2DCOS图像格式为标准正态变量变换(SNV)和轮廓填充图像。此外,比较了一维NIRS数据、2DOCS矩阵数据和2DCOS图像数据的分类结果,发现2DCOS图像具有更高的分类精度。最后,将MobileNet模型与其他深度学习模型进行对比实验,结果表明MobileNet模型具有参数少、计算量小、精度高、收敛速度快等优点。因此,将2DCOS图像与MobileNet模型相结合进行红茶分类是有效的。本文为红茶的种类鉴定提供了一种有前景的方法,在物种分类中具有广泛的应用前景。
{"title":"High precision classification method for black tea: Deep learning combined with two-dimensional correlation spectroscopy","authors":"Long Liu ,&nbsp;Yifan Wang ,&nbsp;Bin Wang ,&nbsp;Xiaoxuan Xu ,&nbsp;Jing Xu","doi":"10.1016/j.chemolab.2025.105580","DOIUrl":"10.1016/j.chemolab.2025.105580","url":null,"abstract":"<div><div>Tea is a widely popular beverage across the globe. However, its medicinal content and value vary from one species to another. As a result, consumers need a quick and efficient method to distinguish between species. This paper introduces a method for species classification using two-dimensional correlation spectroscopy (2DCOS) images combined with deep learning (DL) models. Initially, 345 thin-section samples of five different black teas were prepared, and their near-infrared spectroscopy (NIRS) data were obtained. From this preprocessed one-dimensional NIRS data, 8280 2DCOS contour images and contour fill images were generated. MobileNet model with various bottleneck residual blocks was constructed, and trained using these 2DCOS images as samples, achieving a classification accuracy of 100 %. The model testing results indicated that the optimal NIRS data preprocessing method and 2DCOS image format are Standard Normal Variate transformation (SNV) and contour fill image. Furthermore, the classification results of one-dimensional NIRS data, 2DOCS matrix data, and 2DCOS image data were compared, showing that the 2DCOS images provide higher classification accuracy. Finally, comparative experiments were conducted between the MobileNet model and other deep learning models, demonstrating that the MobileNet model has the advantages of fewer parameters, lower computational load, high accuracy, and fast convergence speed. Therefore, combining 2DCOS images with the MobileNet model for black tea classification is effective. This paper offers a promising approach for the identification of black tea species, with extensive potential applications in species classification.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"268 ","pages":"Article 105580"},"PeriodicalIF":3.8,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145569576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extraction of soil nutrient information from visible and near-infrared signals using deep learning models 利用深度学习模型从可见光和近红外信号中提取土壤养分信息
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-01-15 Epub Date: 2025-10-29 DOI: 10.1016/j.chemolab.2025.105561
Chunru Xiong , Jufang Hu , Ken Cai , Fangxiu Meng , Qinyong Lin , Huazhou Chen
This study aims to combine the deep learning algorithm and the visible and near-infrared (Vis-NIR) spectroscopy technology to build a soil nutrient information extraction model. A deep learning framework based on Long Short-Term Memory (LSTM) is proposed to establish optimal calibration model for the analysis of the full range of Vis-NIR spectral data. Moreover, an influence function is designed to select the informative wavelength variables, which is an important goal in engineering application of spectroscopy for reducing the model dimensionality and enhancing model robustness. Experiment was performed for the prediction of nitrogen (N), phosphorus (P) and potassium (K) contents of soil. The modeling results showed that the proposed model could improve the modeling efficiency of soil nutrient information extraction, and also obtained higher accuracy in the modeling and predictive procedures than the conventional model. This will provide effective response to the challenges in engineering applications, to promote the Vis-NIR spectroscopy technology be applied for fast detection, and to obtain robust models with high precisions in soil nutrient information extraction process.
本研究旨在将深度学习算法与可见光和近红外(Vis-NIR)光谱技术相结合,构建土壤养分信息提取模型。提出了一种基于长短期记忆(LSTM)的深度学习框架,建立了全范围可见光-近红外光谱数据分析的最优校准模型。此外,设计了一个影响函数来选择信息丰富的波长变量,这是光谱学工程应用中降低模型维数和增强模型鲁棒性的重要目标。进行了土壤氮(N)、磷(P)、钾(K)含量预测试验。建模结果表明,该模型可以提高土壤养分信息提取的建模效率,并且在建模和预测过程中获得了比传统模型更高的精度。这将有效应对工程应用中的挑战,促进可见光-近红外光谱技术在土壤养分信息提取过程中的快速检测应用,并获得高精度的鲁棒模型。
{"title":"Extraction of soil nutrient information from visible and near-infrared signals using deep learning models","authors":"Chunru Xiong ,&nbsp;Jufang Hu ,&nbsp;Ken Cai ,&nbsp;Fangxiu Meng ,&nbsp;Qinyong Lin ,&nbsp;Huazhou Chen","doi":"10.1016/j.chemolab.2025.105561","DOIUrl":"10.1016/j.chemolab.2025.105561","url":null,"abstract":"<div><div>This study aims to combine the deep learning algorithm and the visible and near-infrared (Vis-NIR) spectroscopy technology to build a soil nutrient information extraction model. A deep learning framework based on Long Short-Term Memory (LSTM) is proposed to establish optimal calibration model for the analysis of the full range of Vis-NIR spectral data. Moreover, an influence function is designed to select the informative wavelength variables, which is an important goal in engineering application of spectroscopy for reducing the model dimensionality and enhancing model robustness. Experiment was performed for the prediction of nitrogen (N), phosphorus (P) and potassium (K) contents of soil. The modeling results showed that the proposed model could improve the modeling efficiency of soil nutrient information extraction, and also obtained higher accuracy in the modeling and predictive procedures than the conventional model. This will provide effective response to the challenges in engineering applications, to promote the Vis-NIR spectroscopy technology be applied for fast detection, and to obtain robust models with high precisions in soil nutrient information extraction process.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"268 ","pages":"Article 105561"},"PeriodicalIF":3.8,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145398088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quality-driven divisive K-Means: A new clustering strategy for MALDI imaging data for a more precise and less biased characterization of complex biological tissues 质量驱动的分裂k均值:MALDI成像数据的一种新的聚类策略,用于更精确和更少偏见的复杂生物组织表征
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-01-15 Epub Date: 2025-11-19 DOI: 10.1016/j.chemolab.2025.105578
Ruggero Guerrini , Nina Ogrinc , Emilien Colin , Riad Tebbakha , Christophe Attencourt , Ahmed Boudahi , Sylvie Testelin , Stéphanie Dakpé , Isabelle Fournier , Ludovic Duponchel
Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry Imaging (MALDI-MSI) has become the foremost technique for molecular characterization of complex biological tissues, owing to its unparalleled sensitivity, broad molecular coverage, and high spatial resolution. While targeted analysis is traditionally dominated MALDI imaging workflows, the inherent limitations of hypothesis-driven approaches have fueled interest in untargeted strategies. Clustering, particularly k-means-based methods, has emerged as a powerful tool for exploring spectral datasets without predefined assumptions. However, conventional k-means struggles with heterogeneous spectral distributions, prompting the adoption of bisecting k-means in MALDI imaging. Despite its hierarchical structure, bisecting k-means introduces biases by arbitrarily merging or fragmenting clusters, potentially distorting biological interpretations. This study introduces Quality-Driven Divisive k-means, a novel clustering approach that retains the hierarchical nature of bisecting k-means while dynamically optimizing the number of clusters at each partitioning level. Using MALDI imaging of squamous cell carcinoma tissues from the tongue, we illustrate the potential of Quality-Driven Divisive k-means to provide a more faithful representation of molecular architectures, mitigating the distortions inherent to fixed binary partitioning. Our findings suggest that adaptive clustering methodologies could enhance spectroscopic imaging, paving the way for more accurate tissue characterization in biomedical and clinical research.
基质辅助激光解吸/电离质谱成像(MALDI-MSI)由于其无与伦比的灵敏度、广泛的分子覆盖范围和高空间分辨率,已成为复杂生物组织分子表征的首要技术。虽然目标分析传统上主导着MALDI成像工作流程,但假设驱动方法的固有局限性激发了对非目标策略的兴趣。聚类,特别是基于k均值的方法,已经成为一种强大的工具,可以在没有预定义假设的情况下探索光谱数据集。然而,传统的k-means难以适应异质光谱分布,因此在MALDI成像中采用了等分k-means。尽管它有层次结构,但分割k-means会通过任意合并或分割聚类而引入偏差,可能会扭曲生物学解释。本研究引入了一种新的聚类方法,即质量驱动的分裂k-means,它保留了等分k-means的层次性质,同时动态优化了每个划分级别上的聚类数量。使用MALDI成像的鳞状细胞癌组织来自舌头,我们说明了质量驱动分裂k-均值的潜力,以提供一个更忠实的分子结构的表示,减轻固有的固定二元划分的扭曲。我们的研究结果表明,自适应聚类方法可以增强光谱成像,为生物医学和临床研究中更准确的组织表征铺平道路。
{"title":"Quality-driven divisive K-Means: A new clustering strategy for MALDI imaging data for a more precise and less biased characterization of complex biological tissues","authors":"Ruggero Guerrini ,&nbsp;Nina Ogrinc ,&nbsp;Emilien Colin ,&nbsp;Riad Tebbakha ,&nbsp;Christophe Attencourt ,&nbsp;Ahmed Boudahi ,&nbsp;Sylvie Testelin ,&nbsp;Stéphanie Dakpé ,&nbsp;Isabelle Fournier ,&nbsp;Ludovic Duponchel","doi":"10.1016/j.chemolab.2025.105578","DOIUrl":"10.1016/j.chemolab.2025.105578","url":null,"abstract":"<div><div>Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry Imaging (MALDI-MSI) has become the foremost technique for molecular characterization of complex biological tissues, owing to its unparalleled sensitivity, broad molecular coverage, and high spatial resolution. While targeted analysis is traditionally dominated MALDI imaging workflows, the inherent limitations of hypothesis-driven approaches have fueled interest in untargeted strategies. Clustering, particularly k-means-based methods, has emerged as a powerful tool for exploring spectral datasets without predefined assumptions. However, conventional k-means struggles with heterogeneous spectral distributions, prompting the adoption of bisecting k-means in MALDI imaging. Despite its hierarchical structure, bisecting k-means introduces biases by arbitrarily merging or fragmenting clusters, potentially distorting biological interpretations. This study introduces Quality-Driven Divisive k-means, a novel clustering approach that retains the hierarchical nature of bisecting k-means while dynamically optimizing the number of clusters at each partitioning level. Using MALDI imaging of squamous cell carcinoma tissues from the tongue, we illustrate the potential of Quality-Driven Divisive k-means to provide a more faithful representation of molecular architectures, mitigating the distortions inherent to fixed binary partitioning. Our findings suggest that adaptive clustering methodologies could enhance spectroscopic imaging, paving the way for more accurate tissue characterization in biomedical and clinical research.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"268 ","pages":"Article 105578"},"PeriodicalIF":3.8,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145569652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature extraction using differential amplification singular value decomposition in Vis–NIR spectroscopy: Application to cigarette brand identification 基于差分放大奇异值分解的近红外光谱特征提取:在香烟品牌识别中的应用
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-01-15 Epub Date: 2025-11-18 DOI: 10.1016/j.chemolab.2025.105579
Biao Tang , Chengbo Yang , Jianchun Li , Jingjun Wu
Rapid and accurate identification of cigarette brands is crucial for combating counterfeiting and protecting tax revenue. Vis–NIR spectroscopy combined with machine learning is a promising identification method. Nevertheless, redundant information abounds in high-dimensional spectral data, which affects classification accuracy. To address this challenge, this study proposes a novel feature extraction method ― Differential amplification singular value decomposition (DA-SVD). This method optimizes the feature projection direction by amplifying both the individual differences among samples and the overall differences between classes, thereby achieving effective dimensionality reduction of spectral data. By applying DA-SVD, the classification accuracy of KNN, SVM, and RF models on the test set significantly increased from 36 %, 34 %, and 30 % (based on the original data) to 98 % for all models, with precision, sensitivity, and F1 score reaching 97.86 %, 98.14 %, and 97.86 %, respectively, and all outperforming conventional feature extraction methods such as LDA, SVD, and PCA. The experimental results further demonstrated that DA-SVD could achieve satisfactory classification performance without additional preprocessing steps (outlier detection and spectral denoising). In addition, the 10-fold cross-validation results confirmed the stability of the DA-SVD method, and validation on public datasets further demonstrated its generalization ability and applicability. Overall, DA-SVD provides an efficient and robust feature extraction strategy that, when combined with machine learning, enables reliable cigarette brand identification and has broad potential for other spectroscopic applications.
快速准确地识别卷烟品牌对于打击假冒和保护税收至关重要。近红外光谱与机器学习相结合是一种很有前途的识别方法。然而,高维光谱数据中存在大量冗余信息,影响了分类精度。为了解决这一挑战,本研究提出了一种新的特征提取方法-差分放大奇异值分解(DA-SVD)。该方法通过放大样本之间的个体差异和类之间的整体差异来优化特征投影方向,从而实现光谱数据的有效降维。通过应用DA-SVD, KNN、SVM和RF模型在测试集上的分类准确率从36%、34%和30%(基于原始数据)显著提高到98%,精度、灵敏度和F1得分分别达到97.86%、98.14%和97.86%,均优于LDA、SVD和PCA等传统特征提取方法。实验结果进一步表明,DA-SVD无需额外的预处理步骤(离群点检测和光谱去噪)就能获得满意的分类性能。此外,10倍交叉验证结果证实了DA-SVD方法的稳定性,在公共数据集上的验证进一步证明了其泛化能力和适用性。总体而言,DA-SVD提供了一种高效且强大的特征提取策略,当与机器学习相结合时,可以实现可靠的香烟品牌识别,并在其他光谱应用中具有广泛的潜力。
{"title":"Feature extraction using differential amplification singular value decomposition in Vis–NIR spectroscopy: Application to cigarette brand identification","authors":"Biao Tang ,&nbsp;Chengbo Yang ,&nbsp;Jianchun Li ,&nbsp;Jingjun Wu","doi":"10.1016/j.chemolab.2025.105579","DOIUrl":"10.1016/j.chemolab.2025.105579","url":null,"abstract":"<div><div>Rapid and accurate identification of cigarette brands is crucial for combating counterfeiting and protecting tax revenue. Vis–NIR spectroscopy combined with machine learning is a promising identification method. Nevertheless, redundant information abounds in high-dimensional spectral data, which affects classification accuracy. To address this challenge, this study proposes a novel feature extraction method ― Differential amplification singular value decomposition (DA-SVD). This method optimizes the feature projection direction by amplifying both the individual differences among samples and the overall differences between classes, thereby achieving effective dimensionality reduction of spectral data. By applying DA-SVD, the classification accuracy of KNN, SVM, and RF models on the test set significantly increased from 36 %, 34 %, and 30 % (based on the original data) to 98 % for all models, with precision, sensitivity, and F1 score reaching 97.86 %, 98.14 %, and 97.86 %, respectively, and all outperforming conventional feature extraction methods such as LDA, SVD, and PCA. The experimental results further demonstrated that DA-SVD could achieve satisfactory classification performance without additional preprocessing steps (outlier detection and spectral denoising). In addition, the 10-fold cross-validation results confirmed the stability of the DA-SVD method, and validation on public datasets further demonstrated its generalization ability and applicability. Overall, DA-SVD provides an efficient and robust feature extraction strategy that, when combined with machine learning, enables reliable cigarette brand identification and has broad potential for other spectroscopic applications.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"268 ","pages":"Article 105579"},"PeriodicalIF":3.8,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145569653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Chemometrics and Intelligent Laboratory Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1