首页 > 最新文献

Chemometrics and Intelligent Laboratory Systems最新文献

英文 中文
Implementation of artificial intelligence and multivariate analysis to analyze electrical and physicochemical properties of seawater-affected agriculture soil 实施人工智能和多元分析,分析受海水影响的农业土壤的电学和理化性质
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-08-30 DOI: 10.1016/j.chemolab.2025.105520
Ajay L. Vishwakarma, Shruti O. Varma, M.R. Sonawane, Ajay Chaudhari
The impact of salinity on soil has become a major environmental challenge due to global warming and urbanization. The electrical properties of soil are intricately influenced by physicochemical properties, salinity levels, moisture content, and geological features of the land. This work aimed to evaluate the electrical and chemical properties of the agricultural, riparian zone, and near-seafront salt marsh soils using a PC-based automated microwave X-band bench method at frequency 9.55 GHz with ‘infinite sample’ technique. Also, Chemical properties such as pH, sodium absorption ratio (SAR), exchangeable sodium percentage (ESP), organic carbon (OC), phosphorous (P), potassium (K), micronutrients (Fe, Mn, Cu, and Zn), and physical properties such as porosity (PO), particle and bulk density (PD and BD) of soil samples were measured using laboratory method in triplicate. Furthermore, Hierarchical Cluster Analysis (HCA) and Principal Component Analysis (PCA) were employed to classify and differentiate samples based on their properties, providing insights into underlying patterns and groupings. To accurately estimate the dielectric constant and dielectric loss, we implemented Multiple Linear Regression (MLR) and an Artificial Neural Network (ANN) model using a feed-forward back propagation. To evaluate the performance and predictive accuracy of the developed models, statistical metrics such as Root Mean Square Error (RMSE) and the coefficient of determination (R2) were used. The R2 and RMSE values of the dielectric constant obtained by the ANN model with PO, BD, PD, P, OC, K, and ESP as entered variables were 0.99 and 9.23 × 10−04, and for dielectric loss, were 0.98 and 2.93 × 10−02, respectively. For MLR, the R2 value of the dielectric constant and dielectric loss was 0.88 and 0.80. SHAP (SHapley Additive exPlanations) analysis, combined with an ANN model, revealed that the DC is influenced by the Exchangeable Sodium Percentage (ESP), while DL minutely affected. Thus, ANN and SHAP accurately predicted dielectric properties of soil, offering a nondestructive and efficient approach for monitoring salinity effects on soil health.
随着全球变暖和城市化进程的推进,盐碱化对土壤的影响已成为一项重大的环境挑战。土壤的电特性受到土壤的物理化学特性、盐度、水分含量和地质特征的复杂影响。这项工作旨在利用基于pc的自动化微波x波段实验方法,在9.55 GHz频率下使用“无限样本”技术,评估农业、河岸带和近海滨盐沼土壤的电学和化学性质。此外,采用实验室方法对土壤样品的pH、钠吸收比(SAR)、交换钠百分率(ESP)、有机碳(OC)、磷(P)、钾(K)、微量元素(Fe、Mn、Cu和Zn)等化学性质以及孔隙度(PO)、颗粒密度和容重(PD和BD)等物理性质进行了测量。此外,采用层次聚类分析(HCA)和主成分分析(PCA)根据样本的性质对其进行分类和区分,从而深入了解潜在的模式和分组。为了准确估计介质常数和介质损耗,我们采用了多元线性回归(MLR)和人工神经网络(ANN)模型。采用均方根误差(RMSE)和决定系数(R2)等统计指标评价所建模型的性能和预测准确性。以PO、BD、PD、P、OC、K和ESP为输入变量的神经网络模型得到的介电常数R2和RMSE分别为0.99和9.23 × 10−04,介电损耗分别为0.98和2.93 × 10−02。MLR的介电常数和介电损耗R2分别为0.88和0.80。SHapley加性解释(SHapley Additive exPlanations)分析结合人工神经网络模型,发现DC受可交换钠百分比(ESP)的影响,而DL受影响较小。因此,ANN和SHAP能够准确预测土壤的介电特性,为监测盐分对土壤健康的影响提供了一种无损且有效的方法。
{"title":"Implementation of artificial intelligence and multivariate analysis to analyze electrical and physicochemical properties of seawater-affected agriculture soil","authors":"Ajay L. Vishwakarma,&nbsp;Shruti O. Varma,&nbsp;M.R. Sonawane,&nbsp;Ajay Chaudhari","doi":"10.1016/j.chemolab.2025.105520","DOIUrl":"10.1016/j.chemolab.2025.105520","url":null,"abstract":"<div><div>The impact of salinity on soil has become a major environmental challenge due to global warming and urbanization. The electrical properties of soil are intricately influenced by physicochemical properties, salinity levels, moisture content, and geological features of the land. This work aimed to evaluate the electrical and chemical properties of the agricultural, riparian zone, and near-seafront salt marsh soils using a PC-based automated microwave X-band bench method at frequency 9.55 GHz with ‘infinite sample’ technique. Also, Chemical properties such as pH, sodium absorption ratio (SAR), exchangeable sodium percentage (ESP), organic carbon (OC), phosphorous (P), potassium (K), micronutrients (Fe, Mn, Cu, and Zn), and physical properties such as porosity (PO), particle and bulk density (PD and BD) of soil samples were measured using laboratory method in triplicate. Furthermore, Hierarchical Cluster Analysis (HCA) and Principal Component Analysis (PCA) were employed to classify and differentiate samples based on their properties, providing insights into underlying patterns and groupings. To accurately estimate the dielectric constant and dielectric loss, we implemented Multiple Linear Regression (MLR) and an Artificial Neural Network (ANN) model using a feed-forward back propagation. To evaluate the performance and predictive accuracy of the developed models, statistical metrics such as Root Mean Square Error (RMSE) and the coefficient of determination (R<sup>2</sup>) were used. The R<sup>2</sup> and RMSE values of the dielectric constant obtained by the ANN model with PO, BD, PD, P, OC, K, and ESP as entered variables were 0.99 and 9.23 × 10<sup>−04</sup>, and for dielectric loss, were 0.98 and 2.93 × 10<sup>−02</sup>, respectively. For MLR, the R<sup>2</sup> value of the dielectric constant and dielectric loss was 0.88 and 0.80. SHAP (SHapley Additive exPlanations) analysis, combined with an ANN model, revealed that the DC is influenced by the Exchangeable Sodium Percentage (ESP), while DL minutely affected. Thus, ANN and SHAP accurately predicted dielectric properties of soil, offering a nondestructive and efficient approach for monitoring salinity effects on soil health.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"267 ","pages":"Article 105520"},"PeriodicalIF":3.8,"publicationDate":"2025-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144997328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sharpness-aware minimization with physics-informed regularizations for predicting semiconductor material properties in molecular dynamics 分子动力学中预测半导体材料特性的具有物理信息的正则化的锐度感知最小化
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-08-30 DOI: 10.1016/j.chemolab.2025.105511
Dong-Hee Shin, Young-Han Son, Tae-Eui Kam
In recent years, the growing adoption of artificial intelligence across diverse scientific fields has significantly increased demand for advanced semiconductor chips, necessitating innovations in semiconductor material design. Accurate prediction of semiconductor material properties is essential for improving chip performance, as these properties directly affect electrical, thermal, and mechanical characteristics. Traditionally, density functional theory has been the gold standard for atomic-scale simulations in material property prediction; however, its high computational cost limits scalability. Molecular dynamics simulations provide a scalable alternative by leveraging the power of machine learning force fields (MLFFs); however, semiconductor systems present unique challenges due to non-equilibrium dynamics, surface defects, and impurities. These factors often result in out-of-distribution (OOD) atomic configurations, which can significantly degrade model performance. To address this challenge, we propose Physics-Informed Sharpness-Aware Minimization (PI-SAM), a novel framework designed to enhance the prediction of semiconductor material properties across diverse datasets and challenging OOD scenarios. Specifically, PI-SAM leverages sharpness-aware minimization to achieve flatter loss minima, improving the model’s generalization. Additionally, it incorporates physics-informed regularizations to enforce energy-force consistency and account for potential energy surface curvature, ensuring alignment with the underlying physical principles governing semiconductor behavior. Experimental results demonstrate that our PI-SAM outperforms competing methods, especially on OOD datasets, underscoring its effectiveness in improving generalization.
近年来,人工智能在不同科学领域的日益普及,大大增加了对先进半导体芯片的需求,这就需要在半导体材料设计方面进行创新。半导体材料特性的准确预测对于提高芯片性能至关重要,因为这些特性直接影响电学、热学和机械特性。传统上,密度泛函理论一直是预测材料性质的原子尺度模拟的金标准;然而,它的高计算成本限制了可扩展性。分子动力学模拟通过利用机器学习力场(MLFFs)的力量提供了一种可扩展的替代方案;然而,由于非平衡动力学、表面缺陷和杂质,半导体系统面临着独特的挑战。这些因素通常会导致分布外(OOD)原子配置,这会显著降低模型性能。为了应对这一挑战,我们提出了物理知情的锐度感知最小化(PI-SAM),这是一个新的框架,旨在增强对不同数据集和具有挑战性的OOD场景中半导体材料特性的预测。具体来说,PI-SAM利用锐度感知最小化来实现更平坦的损失最小化,从而提高模型的泛化能力。此外,它还结合了物理信息的正则化,以加强能量-力的一致性,并考虑潜在的能量表面曲率,确保与控制半导体行为的潜在物理原理保持一致。实验结果表明,我们的PI-SAM优于竞争对手的方法,特别是在OOD数据集上,强调了它在提高泛化方面的有效性。
{"title":"Sharpness-aware minimization with physics-informed regularizations for predicting semiconductor material properties in molecular dynamics","authors":"Dong-Hee Shin,&nbsp;Young-Han Son,&nbsp;Tae-Eui Kam","doi":"10.1016/j.chemolab.2025.105511","DOIUrl":"10.1016/j.chemolab.2025.105511","url":null,"abstract":"<div><div>In recent years, the growing adoption of artificial intelligence across diverse scientific fields has significantly increased demand for advanced semiconductor chips, necessitating innovations in semiconductor material design. Accurate prediction of semiconductor material properties is essential for improving chip performance, as these properties directly affect electrical, thermal, and mechanical characteristics. Traditionally, density functional theory has been the gold standard for atomic-scale simulations in material property prediction; however, its high computational cost limits scalability. Molecular dynamics simulations provide a scalable alternative by leveraging the power of machine learning force fields (MLFFs); however, semiconductor systems present unique challenges due to non-equilibrium dynamics, surface defects, and impurities. These factors often result in out-of-distribution (OOD) atomic configurations, which can significantly degrade model performance. To address this challenge, we propose Physics-Informed Sharpness-Aware Minimization (PI-SAM), a novel framework designed to enhance the prediction of semiconductor material properties across diverse datasets and challenging OOD scenarios. Specifically, PI-SAM leverages sharpness-aware minimization to achieve flatter loss minima, improving the model’s generalization. Additionally, it incorporates physics-informed regularizations to enforce energy-force consistency and account for potential energy surface curvature, ensuring alignment with the underlying physical principles governing semiconductor behavior. Experimental results demonstrate that our PI-SAM outperforms competing methods, especially on OOD datasets, underscoring its effectiveness in improving generalization.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"267 ","pages":"Article 105511"},"PeriodicalIF":3.8,"publicationDate":"2025-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144926577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beer's linguistics and chemistry: an investigation opening new research perspectives 比尔的语言学和化学:开启新的研究视角的调查
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-08-30 DOI: 10.1016/j.chemolab.2025.105521
Nicola Cavallini , Francesco Savorani , Rasmus Bro , Marina Cocchi
In the last two decades, interest in food production and consumption has progressively grown, alongside the booming popularity of craft beer, fueled by micro-breweries and home brewing. Beer is a complex mixture of compounds — from carbohydrates to proteins and ethanol — shaped by the recipe, ingredients, and production process. Less obvious is that the human tongue, in synergy with the oral cavity and nose, acts as a powerful sensor array. Tasting experiences can be viewed as “analytical sessions”, where sensory signals processed by the brain determine not only if the beer is appreciated but also which tastes and flavours are perceived.
In our study, we investigated the connection between the “objective” chemical profile of beer and the “subjective” sensory descriptions from user reviews. We analysed 88 beers using near-infrared (NIR), visible, and nuclear magnetic resonance (NMR) spectroscopy, pairing them with text reviews processed through natural language processing (NLP) tools and converted into numerical data via a bag-of-words approach. Principal Component Analysis-Generalized Canonical Analysis (PCA-GCA) revealed correlations between chemical signals and topics like “hops,” “brown colour,” and “booze”. NMR data showed the strongest correlations, especially for hops-related terms, while visible spectra linked to colour descriptors. Automated topic extraction often performed comparably to manual term selection, suggesting potential for scalable studies. Despite limitations like dataset size and beer variety, this approach shows promise for aligning chemical composition with sensory perception, with applications for product development and broader food analysis.
A novel approach integrates text corpora with analytical data through chemometrics, linking language complexity to instrumental responses. Results showed strong correlations, like NMR signals with hops-related terms and visible spectra with beer colour. This previously unexplored connection opens the door to designing food products tailored to consumer preferences. The approach is broadly applicable, from food science to medical diagnosis or aligning expert opinions with factual data.
在过去的二十年里,人们对食品生产和消费的兴趣逐渐增长,同时,在微型啤酒厂和家庭酿造的推动下,精酿啤酒蓬勃发展。啤酒是一种复杂的混合物——从碳水化合物到蛋白质和乙醇——由配方、原料和生产过程决定。不太明显的是,人类的舌头与口腔和鼻子协同作用,充当了一个强大的传感器阵列。品尝体验可以被视为“分析会议”,大脑处理的感官信号不仅决定了啤酒是否被欣赏,还决定了感知到的味道和风味。在我们的研究中,我们调查了啤酒的“客观”化学特征与用户评论的“主观”感官描述之间的联系。我们使用近红外(NIR),可见光和核磁共振(NMR)光谱分析了88种啤酒,将它们与通过自然语言处理(NLP)工具处理的文本评论配对,并通过单词袋方法转换为数字数据。主成分分析-广义典型分析(PCA-GCA)揭示了化学信号与“啤酒花”、“棕色”和“酒”等话题之间的相关性。核磁共振数据显示出最强的相关性,特别是与啤酒花相关的术语,而可见光谱与颜色描述符有关。自动主题提取通常与人工术语选择相当,这表明可扩展研究的潜力。尽管存在数据集大小和啤酒种类等限制,但这种方法显示出将化学成分与感官知觉相结合的前景,并可用于产品开发和更广泛的食品分析。一种新颖的方法通过化学计量学将文本语料库与分析数据相结合,将语言复杂性与工具响应联系起来。结果显示了很强的相关性,如核磁共振信号与啤酒花相关的术语和可见光谱与啤酒的颜色。这种以前未被探索的联系为设计适合消费者偏好的食品打开了大门。该方法广泛适用于从食品科学到医学诊断或将专家意见与事实数据结合起来。
{"title":"Beer's linguistics and chemistry: an investigation opening new research perspectives","authors":"Nicola Cavallini ,&nbsp;Francesco Savorani ,&nbsp;Rasmus Bro ,&nbsp;Marina Cocchi","doi":"10.1016/j.chemolab.2025.105521","DOIUrl":"10.1016/j.chemolab.2025.105521","url":null,"abstract":"<div><div>In the last two decades, interest in food production and consumption has progressively grown, alongside the booming popularity of craft beer, fueled by micro-breweries and home brewing. Beer is a complex mixture of compounds — from carbohydrates to proteins and ethanol — shaped by the recipe, ingredients, and production process. Less obvious is that the human tongue, in synergy with the oral cavity and nose, acts as a powerful sensor array. Tasting experiences can be viewed as “analytical sessions”, where sensory signals processed by the brain determine not only if the beer is appreciated but also which tastes and flavours are perceived.</div><div>In our study, we investigated the connection between the “objective” chemical profile of beer and the “subjective” sensory descriptions from user reviews. We analysed 88 beers using near-infrared (NIR), visible, and nuclear magnetic resonance (NMR) spectroscopy, pairing them with text reviews processed through natural language processing (NLP) tools and converted into numerical data via a bag-of-words approach. Principal Component Analysis-Generalized Canonical Analysis (PCA-GCA) revealed correlations between chemical signals and topics like “hops,” “brown colour,” and “booze”. NMR data showed the strongest correlations, especially for hops-related terms, while visible spectra linked to colour descriptors. Automated topic extraction often performed comparably to manual term selection, suggesting potential for scalable studies. Despite limitations like dataset size and beer variety, this approach shows promise for aligning chemical composition with sensory perception, with applications for product development and broader food analysis.</div><div>A novel approach integrates text corpora with analytical data through chemometrics, linking language complexity to instrumental responses. Results showed strong correlations, like NMR signals with hops-related terms and visible spectra with beer colour. This previously unexplored connection opens the door to designing food products tailored to consumer preferences. The approach is broadly applicable, from food science to medical diagnosis or aligning expert opinions with factual data.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"267 ","pages":"Article 105521"},"PeriodicalIF":3.8,"publicationDate":"2025-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144997327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Not from scratch: Explainable deep transfer learning fine-tunning with domain adaptation enables trustworthy COVID-19 prediction 不是从零开始:可解释的深度迁移学习微调与领域自适应可以实现可信的COVID-19预测
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-08-28 DOI: 10.1016/j.chemolab.2025.105517
Bingqiang Zhao , Honglin Zhai , Tianhua Wang , Haiping Shao , Ling Zhu
Medical image analysis can help diagnose Coronavirus Disease 2019 (COVID-19) early and save patient lives before the disease worsens. However, there are various limitations to manual inspection of these medical images, such as dependence on physician experience and subjectivity of assessment. To enable fast and precise disease diagnosis, we propose XDTLMI-Net, a framework using four CNNs (GoogLeNet, ResNet18, ResNet50, ResNet101) skilled in image data processing. This framework uses existing medical domain knowledge to guide transfer learning for COVID-19 Computed tomography (CT) scan images and Chest X-rays (CXR) images. XDTLMI-Net performed three tasks of medical image classification of COVID-19 on three public datasets: COVID-19 CT, SARS-COV-2 CT and COVID-19 CXR. It achieved an average classification accuracy of 0.9897, 0.9752 and 0.9397, and an average classification F1-score of 0.9 guide transfer learning with 898, 0.9741 and 0.9394, respectively. Moreover, we employed the Shaply Additive exPlanations and Gradient-weighted Class Activation Mapping to interpret the COVID-19 predictions and help understand the predictive models’ decision-making process. Generally, a general end-to-end framework called XDTLMI-Net based on CNN and transfer learning was developed, which works on small datasets of medical images, and does not require any segmentation or image preprocessing procedures. Moreover, XDTLMI-Net outperformed on three datasets in fine-tuning course and gave reasonable importance to each input COVID-19 image, showing its potential for application in different clinical scenarios.
医学图像分析可以帮助早期诊断2019冠状病毒病(COVID-19),并在疾病恶化之前挽救患者的生命。然而,人工检查这些医学图像有各种局限性,如对医生经验的依赖和评估的主观性。为了实现快速准确的疾病诊断,我们提出了XDTLMI-Net框架,该框架使用了四个cnn (GoogLeNet, ResNet18, ResNet50, ResNet101)熟练的图像数据处理。该框架使用现有的医学领域知识来指导COVID-19计算机断层扫描(CT)图像和胸部x射线(CXR)图像的迁移学习。XDTLMI-Net在COVID-19 CT、SARS-COV-2 CT和COVID-19 CXR三个公共数据集上完成了COVID-19医学图像分类的三项任务。其平均分类准确率分别为0.9897、0.9752和0.9397,指导迁移学习的平均分类f1得分为0.9,分别为898、0.9741和0.9394。此外,我们采用Shaply加性解释和梯度加权类激活映射来解释COVID-19预测,并帮助理解预测模型的决策过程。一般来说,基于CNN和迁移学习开发了一个通用的端到端框架XDTLMI-Net,它适用于医学图像的小数据集,不需要任何分割和图像预处理程序。此外,XDTLMI-Net在三个数据集的微调过程中表现优异,并对每个输入的COVID-19图像给予合理的重视,显示了其在不同临床场景中的应用潜力。
{"title":"Not from scratch: Explainable deep transfer learning fine-tunning with domain adaptation enables trustworthy COVID-19 prediction","authors":"Bingqiang Zhao ,&nbsp;Honglin Zhai ,&nbsp;Tianhua Wang ,&nbsp;Haiping Shao ,&nbsp;Ling Zhu","doi":"10.1016/j.chemolab.2025.105517","DOIUrl":"10.1016/j.chemolab.2025.105517","url":null,"abstract":"<div><div>Medical image analysis can help diagnose Coronavirus Disease 2019 (COVID-19) early and save patient lives before the disease worsens. However, there are various limitations to manual inspection of these medical images, such as dependence on physician experience and subjectivity of assessment. To enable fast and precise disease diagnosis, we propose XDTLMI-Net, a framework using four CNNs (GoogLeNet, ResNet18, ResNet50, ResNet101) skilled in image data processing. This framework uses existing medical domain knowledge to guide transfer learning for COVID-19 Computed tomography (CT) scan images and Chest X-rays (CXR) images. XDTLMI-Net performed three tasks of medical image classification of COVID-19 on three public datasets: COVID-19 CT, SARS-COV-2 CT and COVID-19 CXR. It achieved an average classification accuracy of 0.9897, 0.9752 and 0.9397, and an average classification F1-score of 0.9 guide transfer learning with 898, 0.9741 and 0.9394, respectively. Moreover, we employed the Shaply Additive exPlanations and Gradient-weighted Class Activation Mapping to interpret the COVID-19 predictions and help understand the predictive models’ decision-making process. Generally, a general end-to-end framework called XDTLMI-Net based on CNN and transfer learning was developed, which works on small datasets of medical images, and does not require any segmentation or image preprocessing procedures. Moreover, XDTLMI-Net outperformed on three datasets in fine-tuning course and gave reasonable importance to each input COVID-19 image, showing its potential for application in different clinical scenarios.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"266 ","pages":"Article 105517"},"PeriodicalIF":3.8,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144917902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FT-NIR combined with multiple intelligent algorithms for rapid identification and quantitative analysis of Iron Mineral Decoction Pieces FT-NIR结合多种智能算法快速识别定量分析铁矿物饮片
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-08-26 DOI: 10.1016/j.chemolab.2025.105512
Yangqian Wu , Yi Wan , Jin Li , Xiangyi Wen , Xiaolan Zhang , Can Zhang , Xiaoli Zhao
Calcined and Vinegar-quenched Magnetite (CVQM), Calcined and Vinegar-quenched Hematite (CVQH), Calcined and Vinegar-quenched Pyrite (CVQP), Calcined and Vinegar-quenched Limonite (CVQL) are all iron-containing mineral decoction pieces, which are easily be confused because of their similar primary compositions and appearances. However, their medicinal values differ significantly, misuse in clinical settings could pose substantial safety risks to patients. In this study, E-eye and Fourier transform near infrared (FT-NIR) combined with multivariate algorithms were employed for the qualitative identification and quantitative prediction of iron content in these four kinds of mineral decoction pieces. The results indicated that the PCA model alongside machine learning classification models with E-eye was ineffective for distinguishing among the four types of decoction pieces, achieving an accuracy rate below 80 %. Furthermore, by utilizing FT-NIR technology with SNV + ICO optimization on raw spectra, we achieved machine-learning classification model accuracies around 90 %, which were improved by 28 %–36 % compared to analyses based solely on raw spectra. Additionally, the quantitative prediction regression (PLSR) model for predicting iron content demonstrated R2C = 0.9627 and R2P = 0.9451, indicating strong linearity and predictive accuracy of the model. Overall, this study demonstrated that FT-NIR combined with multivariate algorithms provided an effective approach for identifying and evaluating the quality of mineral medicines with similar appearances and compositions.
烧醋淬磁铁矿(CVQM)、烧醋淬赤铁矿(CVQH)、烧醋淬黄铁矿(CVQP)、烧醋淬褐铁矿(CVQL)都是含铁矿物饮片,由于它们的主要成分和外观相似,很容易被混淆。然而,它们的药用价值差异很大,在临床环境中的滥用可能会给患者带来重大的安全风险。本研究采用E-eye和傅里叶变换近红外(FT-NIR)结合多元算法对这四种矿物饮片中的铁含量进行定性鉴定和定量预测。结果表明,PCA模型与带有E-eye的机器学习分类模型对四种饮片的区分无效,准确率低于80%。此外,通过利用FT-NIR技术对原始光谱进行SNV + ICO优化,我们实现了90%左右的机器学习分类模型准确率,与仅基于原始光谱的分析相比,准确率提高了28% - 36%。此外,定量预测回归(PLSR)模型预测铁含量的R2C = 0.9627, R2P = 0.9451,表明模型具有较强的线性和预测精度。综上所述,本研究表明FT-NIR结合多元算法为具有相似外观和成分的矿物药物的质量鉴定和评价提供了一种有效的方法。
{"title":"FT-NIR combined with multiple intelligent algorithms for rapid identification and quantitative analysis of Iron Mineral Decoction Pieces","authors":"Yangqian Wu ,&nbsp;Yi Wan ,&nbsp;Jin Li ,&nbsp;Xiangyi Wen ,&nbsp;Xiaolan Zhang ,&nbsp;Can Zhang ,&nbsp;Xiaoli Zhao","doi":"10.1016/j.chemolab.2025.105512","DOIUrl":"10.1016/j.chemolab.2025.105512","url":null,"abstract":"<div><div>Calcined and Vinegar-quenched Magnetite (CVQM), Calcined and Vinegar-quenched Hematite (CVQH), Calcined and Vinegar-quenched Pyrite (CVQP), Calcined and Vinegar-quenched Limonite (CVQL) are all iron-containing mineral decoction pieces, which are easily be confused because of their similar primary compositions and appearances. However, their medicinal values differ significantly, misuse in clinical settings could pose substantial safety risks to patients. In this study, E-eye and Fourier transform near infrared (FT-NIR) combined with multivariate algorithms were employed for the qualitative identification and quantitative prediction of iron content in these four kinds of mineral decoction pieces. The results indicated that the PCA model alongside machine learning classification models with E-eye was ineffective for distinguishing among the four types of decoction pieces, achieving an accuracy rate below 80 %. Furthermore, by utilizing FT-NIR technology with SNV + ICO optimization on raw spectra, we achieved machine-learning classification model accuracies around 90 %, which were improved by 28 %–36 % compared to analyses based solely on raw spectra. Additionally, the quantitative prediction regression (PLSR) model for predicting iron content demonstrated R<sup>2</sup><sub>C</sub> = 0.9627 and R<sup>2</sup><sub>P</sub> = 0.9451, indicating strong linearity and predictive accuracy of the model. Overall, this study demonstrated that FT-NIR combined with multivariate algorithms provided an effective approach for identifying and evaluating the quality of mineral medicines with similar appearances and compositions.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"266 ","pages":"Article 105512"},"PeriodicalIF":3.8,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144908209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Non-destructive aging evaluation of transformer insulation oil via Raman spectroscopy and ensemble learning with KPCA feature extraction 基于拉曼光谱和KPCA特征提取的集成学习的变压器绝缘油无损老化评价
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-08-23 DOI: 10.1016/j.chemolab.2025.105514
Feng Hu , Ziyue Pu , Rongying Dai , Wendou Gan , Junchao Liang , Yulong Zhang , Mengxiao Ni , Yan Ge , Hang Wu , Penghui Chen
Transformer insulating oil aging critically impacts power system reliability. This study develops a non-destructive aging evaluation method using Raman spectroscopy with kernel principal component analysis (KPCA) and ensemble learning. Raman spectral data were obtained through accelerated thermal aging experiments and a spectral detection platform; subsequently, the data were preprocessed using Moving Average Sliding, Savitzky-Golay, and Gaussian filtering. Then, Raman features were extracted using KPCA with four kernel functions (Linear, Polynomial, Gaussian and Sigmoid), and evaluation performance was compared using a decision tree; eventually, four weak classifiers (DT, LDA, SVM, and BPNN) were integrated to construct the final ensemble learning evaluation model. Results showed Gaussian filtering achieved the highest signal-to-noise ratio (35.23 dB); Gaussian kernel KPCA yielded the best feature extraction, achieving 96.88 % average accuracy; and the BPNN ensemble learning evaluation model delivered the highest accuracy of 99.6 %. In addition to verifying the benefits of KPCA in feature extraction and the robustness of the model, this study conducted a comparative test with traditional principal component analysis (PCA) methods and introduced various types and intensities of noise into the test set. The study found that the model can effectively evaluate the aging state of transformer insulating oil and has high anti-interference capabilities, providing a new method for improving transformer operating status monitoring.
变压器绝缘油老化严重影响电力系统的可靠性。本文提出了一种基于核主成分分析(KPCA)和集成学习的拉曼光谱无损老化评价方法。通过加速热老化实验和光谱检测平台获得拉曼光谱数据;随后,使用移动平均滑动、Savitzky-Golay和高斯滤波对数据进行预处理。然后,利用4个核函数(Linear、Polynomial、Gaussian和Sigmoid)的KPCA提取拉曼特征,并利用决策树对评价性能进行比较;最后,将四种弱分类器(DT、LDA、SVM和BPNN)集成在一起,构建最终的集成学习评价模型。结果表明:高斯滤波的信噪比最高,为35.23 dB;高斯核KPCA的特征提取效果最好,平均准确率达到96.88%;BPNN集成学习评价模型准确率最高,达到99.6%。除了验证KPCA在特征提取方面的优势和模型的鲁棒性外,本研究还与传统的主成分分析(PCA)方法进行了对比测试,并在测试集中引入了不同类型和强度的噪声。研究发现,该模型能有效评估变压器绝缘油的老化状态,具有较高的抗干扰能力,为改进变压器运行状态监测提供了一种新的方法。
{"title":"Non-destructive aging evaluation of transformer insulation oil via Raman spectroscopy and ensemble learning with KPCA feature extraction","authors":"Feng Hu ,&nbsp;Ziyue Pu ,&nbsp;Rongying Dai ,&nbsp;Wendou Gan ,&nbsp;Junchao Liang ,&nbsp;Yulong Zhang ,&nbsp;Mengxiao Ni ,&nbsp;Yan Ge ,&nbsp;Hang Wu ,&nbsp;Penghui Chen","doi":"10.1016/j.chemolab.2025.105514","DOIUrl":"10.1016/j.chemolab.2025.105514","url":null,"abstract":"<div><div>Transformer insulating oil aging critically impacts power system reliability. This study develops a non-destructive aging evaluation method using Raman spectroscopy with kernel principal component analysis (KPCA) and ensemble learning. Raman spectral data were obtained through accelerated thermal aging experiments and a spectral detection platform; subsequently, the data were preprocessed using Moving Average Sliding, Savitzky-Golay, and Gaussian filtering. Then, Raman features were extracted using KPCA with four kernel functions (Linear, Polynomial, Gaussian and Sigmoid), and evaluation performance was compared using a decision tree; eventually, four weak classifiers (DT, LDA, SVM, and BPNN) were integrated to construct the final ensemble learning evaluation model. Results showed Gaussian filtering achieved the highest signal-to-noise ratio (35.23 dB); Gaussian kernel KPCA yielded the best feature extraction, achieving 96.88 % average accuracy; and the BPNN ensemble learning evaluation model delivered the highest accuracy of 99.6 %. In addition to verifying the benefits of KPCA in feature extraction and the robustness of the model, this study conducted a comparative test with traditional principal component analysis (PCA) methods and introduced various types and intensities of noise into the test set. The study found that the model can effectively evaluate the aging state of transformer insulating oil and has high anti-interference capabilities, providing a new method for improving transformer operating status monitoring.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"266 ","pages":"Article 105514"},"PeriodicalIF":3.8,"publicationDate":"2025-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144894755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application of multi-omics fusion technique based on Raman spectroscopy and metabolomics in early diagnosis and activity prediction of systemic lupus erythematosus 基于拉曼光谱和代谢组学的多组学融合技术在系统性红斑狼疮早期诊断和活动性预测中的应用
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-08-20 DOI: 10.1016/j.chemolab.2025.105513
Pei Liu , Xuguang Zhou , Xiaoyi Lv , Cheng Chen , Xiaomei Chen , Cainan Luo , Xue Wu , Chen Chen , Lijun Wu
The combination of artificial intelligence and Raman spectroscopy provides new ideas and methods for auxiliary diagnosis of diseases. However, in systemic lupus erythematosus (SLE), there are problems of high pathological consistency and large overlap of spectral information, and single spectral omics cannot obtain ideal results. However, metabolomics has the advantages of directly reflecting the metabolic status in organisms and gaining in-depth understanding of the physiological and pathological states of organisms. At the same time, multi-omics fusion technology can effectively integrate the characteristics of different omics levels. Therefore, this study proposed a Multi-omics Decoupling-Bipartite Attentional Weighting (MDBAW) fusion model based on Raman spectroscopic omics and metabolomics data for the first time. The model fully considers the unique and shared representations between omics, and adds attention weight modules at the input and output ends to give more weight to the features with large amount of information in the two omics modalities. Finally, the experimental results on three data sets proved that the MDBAW model is superior to single-omics and other advanced multi-omics fusion models, and can effectively improve the accuracy of SLE classification diagnosis and activity prediction. In addition, through the correlation analysis of Raman spectroscopic omics and metabolomics data and KEGG pathway analysis, the interpretability of the fusion of these two omics in auxiliary disease diagnosis applications was verified, and the ability of Raman spectroscopy to detect metabolites was proved.
人工智能与拉曼光谱的结合为疾病辅助诊断提供了新的思路和方法。但系统性红斑狼疮(SLE)存在病理一致性高、光谱信息重叠大的问题,单光谱组学无法获得理想的结果。而代谢组学具有直接反映生物体内代谢状态,深入了解生物生理病理状态的优势。同时,多组学融合技术可以有效整合不同组学水平的特征。因此,本研究首次提出了基于拉曼光谱组学和代谢组学数据的多组解耦-双部注意力加权(MDBAW)融合模型。该模型充分考虑组学之间的唯一表征和共享表征,并在输入端和输出端增加关注权重模块,对两种组学模式中信息量较大的特征给予更多的权重。最后,在三个数据集上的实验结果证明,MDBAW模型优于单组学和其他先进的多组学融合模型,可以有效提高SLE分类诊断和活动性预测的准确性。此外,通过拉曼光谱组学与代谢组学数据的相关性分析和KEGG通路分析,验证了这两种组学融合在辅助疾病诊断应用中的可解释性,证明了拉曼光谱检测代谢物的能力。
{"title":"Application of multi-omics fusion technique based on Raman spectroscopy and metabolomics in early diagnosis and activity prediction of systemic lupus erythematosus","authors":"Pei Liu ,&nbsp;Xuguang Zhou ,&nbsp;Xiaoyi Lv ,&nbsp;Cheng Chen ,&nbsp;Xiaomei Chen ,&nbsp;Cainan Luo ,&nbsp;Xue Wu ,&nbsp;Chen Chen ,&nbsp;Lijun Wu","doi":"10.1016/j.chemolab.2025.105513","DOIUrl":"10.1016/j.chemolab.2025.105513","url":null,"abstract":"<div><div>The combination of artificial intelligence and Raman spectroscopy provides new ideas and methods for auxiliary diagnosis of diseases. However, in systemic lupus erythematosus (SLE), there are problems of high pathological consistency and large overlap of spectral information, and single spectral omics cannot obtain ideal results. However, metabolomics has the advantages of directly reflecting the metabolic status in organisms and gaining in-depth understanding of the physiological and pathological states of organisms. At the same time, multi-omics fusion technology can effectively integrate the characteristics of different omics levels. Therefore, this study proposed a Multi-omics Decoupling-Bipartite Attentional Weighting (MDBAW) fusion model based on Raman spectroscopic omics and metabolomics data for the first time. The model fully considers the unique and shared representations between omics, and adds attention weight modules at the input and output ends to give more weight to the features with large amount of information in the two omics modalities. Finally, the experimental results on three data sets proved that the MDBAW model is superior to single-omics and other advanced multi-omics fusion models, and can effectively improve the accuracy of SLE classification diagnosis and activity prediction. In addition, through the correlation analysis of Raman spectroscopic omics and metabolomics data and KEGG pathway analysis, the interpretability of the fusion of these two omics in auxiliary disease diagnosis applications was verified, and the ability of Raman spectroscopy to detect metabolites was proved.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"266 ","pages":"Article 105513"},"PeriodicalIF":3.8,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144893420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
All sparse PCA models are wrong, but some are useful. Part III: Model interpretation 所有稀疏PCA模型都是错误的,但有些是有用的。第三部分:模型解释
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-08-12 DOI: 10.1016/j.chemolab.2025.105498
J. Camacho , A.K. Smilde , E. Saccenti , J.A. Westerhuis , R. Bro
Sparse Principal Component Analysis (sPCA) is a popular matrix factorization that combines variance maximization and sparsity with the ultimate goal of improving data interpretation. In this series of papers we show that the factorization with sPCA can be complex to interpret even when confronted with simple data. In the first paper in this series, we demonstrated that sPCA models have limitations with respect to factorizing sparse and noise-free data accurately when loadings are overlapping. In the second paper, we showed that sPCA algorithms based on deflation can generate artifacts in high order components. We also show that scores orthogonalization and the incorporation of orthonormal loadings are suitable means to avoid large artifacts. Both approaches constrain the set of possible sPCA solutions in a very similar but poorly understood way. In particular, we study in this paper the sPCA solution by Zou et al., which according to our results represent the best sPCA algorithm of those considered in the series. Here, we provide new derivations on the model equations, the computation and interpretation of the model parameters and the selection of metaparemeters in practical cases, making sPCA an even more powerful data modeling tool.
稀疏主成分分析(sPCA)是一种流行的矩阵分解方法,它结合了方差最大化和稀疏性,最终目的是提高数据解释。在这一系列的论文中,我们表明,即使面对简单的数据,sPCA的因式分解也可能是复杂的。在本系列的第一篇文章中,我们证明了当负载重叠时,sPCA模型在准确分解稀疏和无噪声数据方面存在局限性。在第二篇论文中,我们证明了基于通货紧缩的sPCA算法可以在高阶分量中产生伪影。我们还表明,分数正交化和标准正交载荷的结合是避免大型伪影的合适方法。这两种方法都以一种非常相似但难以理解的方式约束了可能的sPCA解决方案集。特别地,我们在本文中研究了邹等人的sPCA解决方案,根据我们的结果,它代表了该系列中考虑的最佳sPCA算法。在这里,我们提供了新的模型方程的推导,模型参数的计算和解释以及在实际情况下元参数的选择,使sPCA成为一个更强大的数据建模工具。
{"title":"All sparse PCA models are wrong, but some are useful. Part III: Model interpretation","authors":"J. Camacho ,&nbsp;A.K. Smilde ,&nbsp;E. Saccenti ,&nbsp;J.A. Westerhuis ,&nbsp;R. Bro","doi":"10.1016/j.chemolab.2025.105498","DOIUrl":"10.1016/j.chemolab.2025.105498","url":null,"abstract":"<div><div>Sparse Principal Component Analysis (sPCA) is a popular matrix factorization that combines variance maximization and sparsity with the ultimate goal of improving data interpretation. In this series of papers we show that the factorization with sPCA can be complex to interpret even when confronted with simple data. In the first paper in this series, we demonstrated that sPCA models have limitations with respect to factorizing sparse and noise-free data accurately when loadings are overlapping. In the second paper, we showed that sPCA algorithms based on deflation can generate artifacts in high order components. We also show that scores orthogonalization and the incorporation of orthonormal loadings are suitable means to avoid large artifacts. Both approaches constrain the set of possible sPCA solutions in a very similar but poorly understood way. In particular, we study in this paper the sPCA solution by Zou et al., which according to our results represent the best sPCA algorithm of those considered in the series. Here, we provide new derivations on the model equations, the computation and interpretation of the model parameters and the selection of metaparemeters in practical cases, making sPCA an even more powerful data modeling tool.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"266 ","pages":"Article 105498"},"PeriodicalIF":3.8,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144864974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Coal origin identification based on visible-infrared spectroscopy and attention networks 基于可见红外光谱和注意网络的煤源识别
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-08-08 DOI: 10.1016/j.chemolab.2025.105501
Jingyi Liu , Ba Tuan Le , Thai Thuy Lam Ha
Coal origin identification is a crucial process in the coal industry, which is important in ensuring coal quality and optimizing supply chain management. However, due to the diversity of coal mine resources and the increasing market demands for quality, coal origin identification has become more complex. This study proposes a coal origin identification method based on spectroscopy and advanced machine learning techniques with deep attention networks. Through an improved model architecture and optimization strategy, the method achieves efficient classification and precise recognition of coal samples. This method uses the attention network as the core to fully explore the potential spectral features in coal samples. Experimental results show that compared with traditional methods, this method has achieved significant improvements in multiple key indicators, verifying its superior performance and application potential. This study not only provides an efficient and reliable solution for coal origin identification, but also provides important support for the intelligent and precise development of the coal industry.
煤炭产地识别是煤炭行业的关键环节,对保证煤炭质量、优化供应链管理具有重要意义。然而,由于煤矿资源的多样性和市场对质量要求的不断提高,煤炭产地鉴定变得更加复杂。本文提出了一种基于光谱学和先进机器学习技术以及深度注意网络的煤源识别方法。该方法通过改进模型结构和优化策略,实现了煤样的高效分类和精确识别。该方法以注意力网络为核心,充分挖掘煤样中潜在的光谱特征。实验结果表明,与传统方法相比,该方法在多个关键指标上取得了显著改进,验证了其优越的性能和应用潜力。该研究不仅为煤炭产地识别提供了高效可靠的解决方案,而且为煤炭工业的智能化、精细化发展提供了重要支撑。
{"title":"Coal origin identification based on visible-infrared spectroscopy and attention networks","authors":"Jingyi Liu ,&nbsp;Ba Tuan Le ,&nbsp;Thai Thuy Lam Ha","doi":"10.1016/j.chemolab.2025.105501","DOIUrl":"10.1016/j.chemolab.2025.105501","url":null,"abstract":"<div><div>Coal origin identification is a crucial process in the coal industry, which is important in ensuring coal quality and optimizing supply chain management. However, due to the diversity of coal mine resources and the increasing market demands for quality, coal origin identification has become more complex. This study proposes a coal origin identification method based on spectroscopy and advanced machine learning techniques with deep attention networks. Through an improved model architecture and optimization strategy, the method achieves efficient classification and precise recognition of coal samples. This method uses the attention network as the core to fully explore the potential spectral features in coal samples. Experimental results show that compared with traditional methods, this method has achieved significant improvements in multiple key indicators, verifying its superior performance and application potential. This study not only provides an efficient and reliable solution for coal origin identification, but also provides important support for the intelligent and precise development of the coal industry.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"266 ","pages":"Article 105501"},"PeriodicalIF":3.8,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144886077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Orthogonal long short-term memory autoencoder for semi-supervised soft sensor modeling 用于半监督软传感器建模的正交长短时记忆自编码器
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-07-29 DOI: 10.1016/j.chemolab.2025.105499
Fangyuan Ma , Cheng Ji , Jingde Wang , Wei Sun , Jose A. Romagnoli
Data-driven soft sensor methods are popularly applied to predict hard-to-measure variables in industrial production processes. However, in practice, the number of labeled samples is limited, which will affect the accuracy of developed soft sensors. Aiming at this point, semi-supervised soft sensor methods are proposed that combine unsupervised feature extraction and supervised mapping correlation establishment. Auto encoder (AE) is a commonly used feature extraction method for effectively capturing the nonlinear features of processes from unlabeled data. Since typical AEs have no special constraints on the output of latent space, there could be redundancy among the extracted features, which will increase the complexity of mapping correlation establishment. Meanwhile, the dynamic features of processes are difficult to extract by typical AE. Both issues could affect the performance of soft sensors. To address these issues, an Orthogonal Long Short-Term Memory Auto encoder (OLAE) is proposed in this work. By adding the orthogonal constraint on latent space output to the loss function of Long Short-Term Memory Auto encoder, orthogonal dynamic features can be obtained. Then, the OLAE is employed in the feature extraction stage. Using Chatterjee's New Coefficient, orthogonal features related to hard-to-measure variables are screened out for mapping correlation establishment. Considering the limited number of labeled data samples, a prediction model based on support vector regression is established to realize the prediction of hard-to-measure variables. Data from a penicillin fermentation process and an industrial cracking furnace are investigated to evaluate the effectiveness of the proposed soft sensor method.
数据驱动的软测量方法被广泛应用于预测工业生产过程中难以测量的变量。然而,在实际应用中,标记样品的数量是有限的,这将影响开发的软传感器的精度。针对这一点,提出了将无监督特征提取和有监督映射相关性建立相结合的半监督软测量方法。自动编码器(AE)是一种常用的特征提取方法,可以有效地从未标记的数据中捕获过程的非线性特征。由于典型的ae对潜在空间的输出没有特殊的约束,因此提取的特征之间可能存在冗余,这将增加映射相关性建立的复杂性。同时,典型声发射难以提取过程的动态特征。这两个问题都可能影响软传感器的性能。为了解决这些问题,本文提出了一种正交长短期记忆自动编码器(OLAE)。通过在长短期记忆自编码器的损失函数中加入潜在空间输出的正交约束,可以得到正交动态特征。然后,在特征提取阶段使用OLAE。利用查特吉新系数,筛选出与难以测量变量相关的正交特征,建立映射相关性。考虑到标记数据样本数量有限,建立了基于支持向量回归的预测模型,实现了对难以测量变量的预测。通过对青霉素发酵过程和工业裂解炉的数据进行研究,以评价所提出的软测量方法的有效性。
{"title":"Orthogonal long short-term memory autoencoder for semi-supervised soft sensor modeling","authors":"Fangyuan Ma ,&nbsp;Cheng Ji ,&nbsp;Jingde Wang ,&nbsp;Wei Sun ,&nbsp;Jose A. Romagnoli","doi":"10.1016/j.chemolab.2025.105499","DOIUrl":"10.1016/j.chemolab.2025.105499","url":null,"abstract":"<div><div>Data-driven soft sensor methods are popularly applied to predict hard-to-measure variables in industrial production processes. However, in practice, the number of labeled samples is limited, which will affect the accuracy of developed soft sensors. Aiming at this point, semi-supervised soft sensor methods are proposed that combine unsupervised feature extraction and supervised mapping correlation establishment. Auto encoder (AE) is a commonly used feature extraction method for effectively capturing the nonlinear features of processes from unlabeled data. Since typical AEs have no special constraints on the output of latent space, there could be redundancy among the extracted features, which will increase the complexity of mapping correlation establishment. Meanwhile, the dynamic features of processes are difficult to extract by typical AE. Both issues could affect the performance of soft sensors. To address these issues, an Orthogonal Long Short-Term Memory Auto encoder (OLAE) is proposed in this work. By adding the orthogonal constraint on latent space output to the loss function of Long Short-Term Memory Auto encoder, orthogonal dynamic features can be obtained. Then, the OLAE is employed in the feature extraction stage. Using Chatterjee's New Coefficient, orthogonal features related to hard-to-measure variables are screened out for mapping correlation establishment. Considering the limited number of labeled data samples, a prediction model based on support vector regression is established to realize the prediction of hard-to-measure variables. Data from a penicillin fermentation process and an industrial cracking furnace are investigated to evaluate the effectiveness of the proposed soft sensor method.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"265 ","pages":"Article 105499"},"PeriodicalIF":3.8,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144773027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Chemometrics and Intelligent Laboratory Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1