首页 > 最新文献

Chemometrics and Intelligent Laboratory Systems最新文献

英文 中文
Text mining-based profiling of chemical environments in protein–ligand binding assays across analytical techniques 跨分析技术的蛋白质配体结合分析中基于文本挖掘的化学环境分析
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-04-15 Epub Date: 2026-02-05 DOI: 10.1016/j.chemolab.2026.105659
Erdem Önal , Zeynep Kalaycıoğlu
Protein–ligand binding studies are critical in drug discovery and development, as they offer valuable insights into molecular interactions that underlie biological function, disease mechanisms, and therapeutic effects. The potential of combining text mining with cheminformatics to explore trends in protein–ligand binding studies across a range of analytical techniques was evaluated in this study. Six widely used analytical techniques were selected to reveal important patterns. Utilizing an open-source Python platform (SCOPE), we analyzed over 33,000 scientific articles and more than 1.3 million chemical entities. The resulting data were visualized as two-dimensional hexbin plots, revealing trends in hydrophobicity (log P)–molecular weight (Da) for each technique. Instead of focusing solely on ligands, this study aims to characterize the overall chemical environments—including solvents, buffers, and supporting agents—associated with protein–ligand binding assays. By analyzing the physicochemical properties of compounds reported across different analytical techniques, we highlight how method-specific preferences shape the experimental design landscape. The analysis integrates unsupervised K-means clustering, multivariate principal component analysis (PCA), and nonparametric statistical testing to quantitatively compare technique-associated chemical spaces. Moreover, this study offers a data-driven perspective on methodologies and historical trends in protein–ligand binding research. It is positioned as a data-driven, method-centric literature analysis rather than a traditional narrative review.
蛋白质-配体结合研究在药物发现和开发中至关重要,因为它们为生物学功能、疾病机制和治疗效果基础上的分子相互作用提供了有价值的见解。本研究评估了将文本挖掘与化学信息学相结合的潜力,通过一系列分析技术探索蛋白质配体结合研究的趋势。选择了六种广泛使用的分析技术来揭示重要的模式。利用开源Python平台(SCOPE),我们分析了超过33,000篇科学文章和超过130万个化学实体。结果数据被可视化为二维hexbin图,揭示了每种技术的疏水性(log P) -分子量(Da)的趋势。而不是仅仅关注配体,本研究的目的是表征整体的化学环境-包括溶剂,缓冲液和支持剂-与蛋白质配体结合分析相关。通过分析不同分析技术报告的化合物的物理化学性质,我们强调了方法特定偏好如何塑造实验设计景观。该分析集成了无监督k均值聚类、多元主成分分析(PCA)和非参数统计检验,以定量比较技术相关的化学空间。此外,本研究为蛋白质配体结合研究的方法和历史趋势提供了数据驱动的视角。它被定位为数据驱动的、以方法为中心的文献分析,而不是传统的叙事评论。
{"title":"Text mining-based profiling of chemical environments in protein–ligand binding assays across analytical techniques","authors":"Erdem Önal ,&nbsp;Zeynep Kalaycıoğlu","doi":"10.1016/j.chemolab.2026.105659","DOIUrl":"10.1016/j.chemolab.2026.105659","url":null,"abstract":"<div><div>Protein–ligand binding studies are critical in drug discovery and development, as they offer valuable insights into molecular interactions that underlie biological function, disease mechanisms, and therapeutic effects. The potential of combining text mining with cheminformatics to explore trends in protein–ligand binding studies across a range of analytical techniques was evaluated in this study. Six widely used analytical techniques were selected to reveal important patterns. Utilizing an open-source Python platform (SCOPE), we analyzed over 33,000 scientific articles and more than 1.3 million chemical entities. The resulting data were visualized as two-dimensional hexbin plots, revealing trends in hydrophobicity (log P)–molecular weight (Da) for each technique. Instead of focusing solely on ligands, this study aims to characterize the overall chemical environments—including solvents, buffers, and supporting agents—associated with protein–ligand binding assays. By analyzing the physicochemical properties of compounds reported across different analytical techniques, we highlight how method-specific preferences shape the experimental design landscape. The analysis integrates unsupervised K-means clustering, multivariate principal component analysis (PCA), and nonparametric statistical testing to quantitatively compare technique-associated chemical spaces. Moreover, this study offers a data-driven perspective on methodologies and historical trends in protein–ligand binding research. It is positioned as a data-driven, method-centric literature analysis rather than a traditional narrative review.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"271 ","pages":"Article 105659"},"PeriodicalIF":3.8,"publicationDate":"2026-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A graph-based soft sensor using feature expansion and multi-hop attention for melt index prediction 一种基于特征展开和多跳关注的图形软测量方法用于熔体指数预测
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-04-15 Epub Date: 2026-02-02 DOI: 10.1016/j.chemolab.2026.105656
Jingwen Ou, Yuhong Wang
Polypropylene serves as a fundamental material used in consumer products and advanced technological applications, where accurate melt index (MI) prediction is critical for quality control in polymerization. Existing offline analysis of MI are time-consuming and costly, so the development of MI soft sensor has become a research hit. The variables in the propylene polymerization process form a complex nonlinear relationship through the polymerization reaction. Graph Convolutional networks can better capture the spatial dependence between variables, but have the disadvantages of fixed structure and insufficient propagation depth. To this end, this work proposes a Feature Expansion Multi-hop Graph Attention Network (FMGAT) framework considering the receptive field enhancement and multi-level capture of features. The novelty of this framework lies in its integrated design for MI soft sensor, combining established attention and feature expansion mechanisms in a novel configuration tailored for polymerization processes. Unconnected nodes are connected by attention diffusion, which increases the receptive field of each layer. FMGAT uses multi-subspace parallel computing to extract features, which effectively reduces the homogenization of features. Marginally Regression Conditional Tabular Generative Adversarial Network (MRCTGAN) is introduced to generate samples in data processing. The statistical and regression evaluation metrics are developed to comprehensively study the performance of MRCTGAN and FMGAT on an industrial dataset. Results show that MRCTGAN has the optimal histogram intersection dissimilarity in sample generation methods. Models trained on MRCTGAN-augmented data achieves average 8.2% lower Root Mean Square Error (RMSE) than original data. FMGAT significantly outperforms baselines, reducing RMSE to 0.4643g/10min. FMGAT establishes an interpretable, robust paradigm for complex industrial process modeling.
聚丙烯是用于消费品和先进技术应用的基础材料,其中准确的熔体指数(MI)预测对聚合的质量控制至关重要。现有的MI离线分析既耗时又昂贵,因此MI软传感器的开发已成为研究热点。丙烯聚合过程中的变量通过聚合反应形成复杂的非线性关系。图卷积网络能较好地捕捉变量间的空间依赖关系,但存在结构固定、传播深度不足的缺点。为此,本文提出了一种考虑接收野增强和特征多层次捕获的特征扩展多跳图注意网络(FMGAT)框架。该框架的新颖之处在于其MI软传感器的集成设计,将已建立的注意力和特征扩展机制结合在为聚合过程量身定制的新配置中。未连接的节点通过注意力扩散连接起来,这增加了每一层的接受野。FMGAT采用多子空间并行计算提取特征,有效降低了特征的同质化程度。引入边际回归条件表生成对抗网络(MRCTGAN)来生成数据处理中的样本。为了全面研究MRCTGAN和FMGAT在工业数据集上的性能,开发了统计和回归评估指标。结果表明,MRCTGAN在样本生成方法中具有最佳的直方图交集不相似度。在mrctgan增强数据上训练的模型比原始数据的均方根误差(RMSE)平均降低8.2%。FMGAT显著优于基线,将RMSE降低到0.4643g/10min。FMGAT为复杂的工业过程建模建立了一个可解释的、健壮的范例。
{"title":"A graph-based soft sensor using feature expansion and multi-hop attention for melt index prediction","authors":"Jingwen Ou,&nbsp;Yuhong Wang","doi":"10.1016/j.chemolab.2026.105656","DOIUrl":"10.1016/j.chemolab.2026.105656","url":null,"abstract":"<div><div>Polypropylene serves as a fundamental material used in consumer products and advanced technological applications, where accurate melt index (MI) prediction is critical for quality control in polymerization. Existing offline analysis of MI are time-consuming and costly, so the development of MI soft sensor has become a research hit. The variables in the propylene polymerization process form a complex nonlinear relationship through the polymerization reaction. Graph Convolutional networks can better capture the spatial dependence between variables, but have the disadvantages of fixed structure and insufficient propagation depth. To this end, this work proposes a Feature Expansion Multi-hop Graph Attention Network (FMGAT) framework considering the receptive field enhancement and multi-level capture of features. The novelty of this framework lies in its integrated design for MI soft sensor, combining established attention and feature expansion mechanisms in a novel configuration tailored for polymerization processes. Unconnected nodes are connected by attention diffusion, which increases the receptive field of each layer. FMGAT uses multi-subspace parallel computing to extract features, which effectively reduces the homogenization of features. Marginally Regression Conditional Tabular Generative Adversarial Network (MRCTGAN) is introduced to generate samples in data processing. The statistical and regression evaluation metrics are developed to comprehensively study the performance of MRCTGAN and FMGAT on an industrial dataset. Results show that MRCTGAN has the optimal histogram intersection dissimilarity in sample generation methods. Models trained on MRCTGAN-augmented data achieves average 8.2% lower Root Mean Square Error (RMSE) than original data. FMGAT significantly outperforms baselines, reducing RMSE to 0.4643g/10min. FMGAT establishes an interpretable, robust paradigm for complex industrial process modeling.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"271 ","pages":"Article 105656"},"PeriodicalIF":3.8,"publicationDate":"2026-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SVEMnet: An R package for self-validated elastic-net ensembles and multi-response optimization in small-sample mixture–process experiments SVEMnet:一个用于自验证弹性网集成和小样本混合过程实验中的多响应优化的R包
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-04-15 Epub Date: 2026-02-11 DOI: 10.1016/j.chemolab.2026.105660
Andrew T. Karl
SVEMnet is an R package for fitting Self-Validated Ensemble Models (SVEM) with elastic-net base learners and performing multi-response optimization in small-sample mixture–process design-of-experiments (DOE) studies with numeric, categorical, and mixture factors. SVEMnet wraps elastic-net and relaxed elastic-net models for Gaussian and binomial responses from glmnet in a fractional random-weight (FRW) resampling scheme with anti-correlated train/validation weights; penalties are selected by validation-weighted AIC- and BIC-type criteria, and predictions are averaged across replicates to stabilize fits near the interpolation boundary. In addition to the core SVEM engine, the package provides deterministic high-order formula expansion, a permutation-based whole-model test heuristic, and a mixture-constrained random-search optimizer that combines Derringer–Suich desirability functions, bootstrap-based uncertainty summaries, and optional mean-level specification-limit probabilities to generate scored candidate tables and diverse exploitation and exploration medoids for sequential fit–score–run–refit workflows. A simulated lipid nanoparticle (LNP) formulation study illustrates these tools, and simulation experiments based on sparse quadratic response surfaces benchmark SVEMnet against repeated cross-validated elastic-net baselines.
SVEMnet是一个R软件包,用于将自验证集成模型(SVEM)与弹性网络基础学习器拟合,并在具有数值,分类和混合因素的小样本混合过程实验设计(DOE)研究中执行多响应优化。SVEMnet采用分数阶随机权重(FRW)重采样方案,采用反相关训练/验证权,对glmnet的高斯和二项响应进行弹性网和松弛弹性网模型包装;通过验证加权AIC和bic类型的标准选择惩罚,并在重复之间平均预测以稳定插值边界附近的拟合。除了核心SVEM引擎之外,该软件包还提供了确定性高阶公式扩展、基于置换的整体模型测试启发式和混合约束随机搜索优化器,该优化器结合了deringer - suich期望函数、基于引导的不确定性摘要和可选的平均水平规范限制概率,以生成得分候选表以及用于顺序匹配-得分-运行-修改工作流的各种开发和探索媒体。一项模拟脂质纳米颗粒(LNP)配方研究说明了这些工具,并基于稀疏二次响应面基准SVEMnet对重复交叉验证的弹性网基线进行了模拟实验。
{"title":"SVEMnet: An R package for self-validated elastic-net ensembles and multi-response optimization in small-sample mixture–process experiments","authors":"Andrew T. Karl","doi":"10.1016/j.chemolab.2026.105660","DOIUrl":"10.1016/j.chemolab.2026.105660","url":null,"abstract":"<div><div><span>SVEMnet</span> is an R package for fitting Self-Validated Ensemble Models (SVEM) with elastic-net base learners and performing multi-response optimization in small-sample mixture–process design-of-experiments (DOE) studies with numeric, categorical, and mixture factors. <span>SVEMnet</span> wraps elastic-net and relaxed elastic-net models for Gaussian and binomial responses from <span>glmnet</span> in a fractional random-weight (FRW) resampling scheme with anti-correlated train/validation weights; penalties are selected by validation-weighted AIC- and BIC-type criteria, and predictions are averaged across replicates to stabilize fits near the interpolation boundary. In addition to the core SVEM engine, the package provides deterministic high-order formula expansion, a permutation-based whole-model test heuristic, and a mixture-constrained random-search optimizer that combines Derringer–Suich desirability functions, bootstrap-based uncertainty summaries, and optional mean-level specification-limit probabilities to generate scored candidate tables and diverse exploitation and exploration medoids for sequential fit–score–run–refit workflows. A simulated lipid nanoparticle (LNP) formulation study illustrates these tools, and simulation experiments based on sparse quadratic response surfaces benchmark <span>SVEMnet</span> against repeated cross-validated elastic-net baselines.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"271 ","pages":"Article 105660"},"PeriodicalIF":3.8,"publicationDate":"2026-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146186943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fiducial inference for random-effects calibration models: Advancing reliable quantification in environmental analytical chemistry 随机效应校准模型的基准推理:推进环境分析化学的可靠定量
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-04-15 Epub Date: 2026-02-05 DOI: 10.1016/j.chemolab.2026.105652
Soumya Sahu , Thomas Mathew , Robert Gibbons , Dulal K. Bhaumik
This article addresses calibration challenges in analytical chemistry by employing a random-effects calibration curve model and its generalizations to capture variability in analyte concentrations. The model is motivated by specific issues in analytical chemistry, where measurement errors remain constant at low concentrations but increase proportionally as concentrations rise. To account for this, the model permits the parameters of the calibration curve, which relate instrument responses to true concentrations, to vary across different laboratories, thereby reflecting the potential variability in measurement processes. The calibration curve that accurately captures the heteroscedastic nature of the data results in more reliable estimates across diverse laboratory conditions. Noting that traditional large-sample interval estimation methods are inadequate for small samples, an alternative approach, namely the fiducial approach, is explored in this work. It turns out that the fiducial approach, when used to construct a confidence interval for an unknown concentration, outperforms all other available approaches in terms of maintaining the coverage probabilities. Applications considered include the determination of the presence of an analyte and the interval estimation of an unknown true analyte concentration. The proposed method is demonstrated for both simulated and real interlaboratory data, including examples involving copper and cadmium in distilled water.
本文通过采用随机效应校准曲线模型及其概括来捕获分析物浓度的可变性,解决了分析化学中的校准挑战。该模型的动机是分析化学中的特定问题,其中测量误差在低浓度下保持恒定,但随着浓度的增加而成比例地增加。为了解释这一点,该模型允许校准曲线的参数在不同的实验室中变化,这些参数与仪器对真实浓度的响应有关,从而反映了测量过程中的潜在可变性。校准曲线准确地捕获了数据的异方差特性,从而在不同的实验室条件下获得更可靠的估计。注意到传统的大样本区间估计方法不适用于小样本,本文探索了一种替代方法,即基准方法。事实证明,当用于为未知浓度构建置信区间时,基准方法在保持覆盖概率方面优于所有其他可用方法。考虑的应用包括分析物存在的确定和未知真实分析物浓度的区间估计。所提出的方法对模拟和真实的实验室间数据进行了验证,包括涉及蒸馏水中铜和镉的示例。
{"title":"Fiducial inference for random-effects calibration models: Advancing reliable quantification in environmental analytical chemistry","authors":"Soumya Sahu ,&nbsp;Thomas Mathew ,&nbsp;Robert Gibbons ,&nbsp;Dulal K. Bhaumik","doi":"10.1016/j.chemolab.2026.105652","DOIUrl":"10.1016/j.chemolab.2026.105652","url":null,"abstract":"<div><div>This article addresses calibration challenges in analytical chemistry by employing a random-effects calibration curve model and its generalizations to capture variability in analyte concentrations. The model is motivated by specific issues in analytical chemistry, where measurement errors remain constant at low concentrations but increase proportionally as concentrations rise. To account for this, the model permits the parameters of the calibration curve, which relate instrument responses to true concentrations, to vary across different laboratories, thereby reflecting the potential variability in measurement processes. The calibration curve that accurately captures the heteroscedastic nature of the data results in more reliable estimates across diverse laboratory conditions. Noting that traditional large-sample interval estimation methods are inadequate for small samples, an alternative approach, namely the fiducial approach, is explored in this work. It turns out that the fiducial approach, when used to construct a confidence interval for an unknown concentration, outperforms all other available approaches in terms of maintaining the coverage probabilities. Applications considered include the determination of the presence of an analyte and the interval estimation of an unknown true analyte concentration. The proposed method is demonstrated for both simulated and real interlaboratory data, including examples involving copper and cadmium in distilled water.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"271 ","pages":"Article 105652"},"PeriodicalIF":3.8,"publicationDate":"2026-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A review of multivariate modelling for x-ray fluorescence techniques x射线荧光技术的多变量建模综述
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-04-15 Epub Date: 2026-02-09 DOI: 10.1016/j.chemolab.2026.105663
Dennis Silva Ferreira , Robson Almeida Silva , Gustavo Macedo Pacheco , Edenir Rodrigues Pereira-Filho , Fabiola Manhas Verbi Pereira
X-ray fluorescence (XRF) techniques have been integrated with chemometrics, enabling more robust qualitative and quantitative analysis across increasingly complex matrices. Energy-dispersive XRF (ED-XRF), despite intrinsic limitations such as matrix effects and low sensitivity to light elements, has benefited from multivariate modelling, including stacked generalization, metaheuristic variable selection, and supervised classification, improving soil fertility prediction, food authentication, and material screening. Data fusion strategies combining ED-XRF with laser-induced breakdown spectroscopy (LIBS), Raman, Fourier transform infrared spectroscopy (FTIR), near-infrared spectroscopy (NIR), or ultraviolet-visible spectroscopy (UV-Vis) further mitigate spectral redundancy and enhance the detection of light elements, supporting applications in cultural heritage, environmental monitoring, biomedical diagnostics, and forensic classification. Advances in micro- and synchrotron-based XRF have expanded analytical resolution, necessitating chemometric tools such as principal component analysis (PCA), multivariate curve resolution-alternating least squares (MCR-ALS), self-organizing map with relational perspective map (SOM-RPM), and partial least squares discriminant analysis (PLS-DA) to decompose hyperspectral datasets, validate conservation treatments, identify phase transformations, and characterize biological tissues. Total reflection XRF (TXRF) and particle-induced x-ray emission (PIXE) likewise demonstrate improved discrimination and biomarker discovery when coupled with variable-selection strategies and multivariate classification. Emerging approaches in wavelength-dispersive XRF (WDXRF), including the exploitation of valence-to-core transitions and scattering spectra with partial least squares (PLS) modelling, provide promising routes for evaluating light-element content and fuel quality. Overall, chemometrics has become indispensable for extracting meaningful chemical information from XRF data, thereby enhancing interpretability and applicability across scientific domains.
x射线荧光(XRF)技术已与化学计量学相结合,能够对日益复杂的矩阵进行更强大的定性和定量分析。能量色散XRF (ED-XRF)尽管存在固有的局限性,如基质效应和对轻元素的低灵敏度,但它受益于多元建模,包括堆叠泛化、元启发式变量选择和监督分类,可以改善土壤肥力预测、食品认证和材料筛选。结合ED-XRF与激光诱导击穿光谱(LIBS)、拉曼光谱、傅里叶变换红外光谱(FTIR)、近红外光谱(NIR)或紫外可见光谱(UV-Vis)的数据融合策略进一步减轻了光谱冗余,增强了对轻元素的检测,支持了文化遗产、环境监测、生物医学诊断和法医分类等领域的应用。基于微同步加速器的XRF技术的进步扩大了分析分辨率,需要化学计量学工具,如主成分分析(PCA)、多元曲线分辨率-交替最小二乘(MCR-ALS)、自组织图与关系透视图(SOM-RPM)和偏最小二乘判别分析(PLS-DA)来分解高光谱数据集、验证保护处理、识别相变和表征生物组织。全反射XRF (TXRF)和粒子诱导x射线发射(PIXE)在与变量选择策略和多变量分类相结合时,同样证明了更好的识别和生物标志物发现。波长色散XRF (WDXRF)的新兴方法,包括利用价核跃迁和偏最小二乘(PLS)建模的散射光谱,为评估轻元素含量和燃料质量提供了有前途的途径。总的来说,化学计量学已经成为从XRF数据中提取有意义的化学信息的不可或缺的工具,从而增强了科学领域的可解释性和适用性。
{"title":"A review of multivariate modelling for x-ray fluorescence techniques","authors":"Dennis Silva Ferreira ,&nbsp;Robson Almeida Silva ,&nbsp;Gustavo Macedo Pacheco ,&nbsp;Edenir Rodrigues Pereira-Filho ,&nbsp;Fabiola Manhas Verbi Pereira","doi":"10.1016/j.chemolab.2026.105663","DOIUrl":"10.1016/j.chemolab.2026.105663","url":null,"abstract":"<div><div>X-ray fluorescence (XRF) techniques have been integrated with chemometrics, enabling more robust qualitative and quantitative analysis across increasingly complex matrices. Energy-dispersive XRF (ED-XRF), despite intrinsic limitations such as matrix effects and low sensitivity to light elements, has benefited from multivariate modelling, including stacked generalization, metaheuristic variable selection, and supervised classification, improving soil fertility prediction, food authentication, and material screening. Data fusion strategies combining ED-XRF with laser-induced breakdown spectroscopy (LIBS), Raman, Fourier transform infrared spectroscopy (FTIR), near-infrared spectroscopy (NIR), or ultraviolet-visible spectroscopy (UV-Vis) further mitigate spectral redundancy and enhance the detection of light elements, supporting applications in cultural heritage, environmental monitoring, biomedical diagnostics, and forensic classification. Advances in micro- and synchrotron-based XRF have expanded analytical resolution, necessitating chemometric tools such as principal component analysis (PCA), multivariate curve resolution-alternating least squares (MCR-ALS), self-organizing map with relational perspective map (SOM-RPM), and partial least squares discriminant analysis (PLS-DA) to decompose hyperspectral datasets, validate conservation treatments, identify phase transformations, and characterize biological tissues. Total reflection XRF (TXRF) and particle-induced x-ray emission (PIXE) likewise demonstrate improved discrimination and biomarker discovery when coupled with variable-selection strategies and multivariate classification. Emerging approaches in wavelength-dispersive XRF (WDXRF), including the exploitation of valence-to-core transitions and scattering spectra with partial least squares (PLS) modelling, provide promising routes for evaluating light-element content and fuel quality. Overall, chemometrics has become indispensable for extracting meaningful chemical information from XRF data, thereby enhancing interpretability and applicability across scientific domains.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"271 ","pages":"Article 105663"},"PeriodicalIF":3.8,"publicationDate":"2026-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146186944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Near-infrared spectroscopic prediction of gasoline olefin content: A systematic approach using continuous region feature selection and region-sensitive ensemble learning 近红外光谱预测汽油烯烃含量:使用连续区域特征选择和区域敏感集合学习的系统方法
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-04-15 Epub Date: 2026-02-05 DOI: 10.1016/j.chemolab.2026.105661
Jiaxue Cui , Dawei Zhang , Banglian Xu , Jianzhong Fan , Xianglong Cao
This study addresses the challenges of high-dimensional collinearity and regional information heterogeneity in near-infrared spectroscopy for gasoline olefin content prediction by proposing a systematic optimization approach combining a Continuous Region Utilizing Integrated Spectral Evaluation for Near-Infrared (CRUISE-NIR) algorithm with a Region-Sensitive Adaptive Ensemble Learning (RAEL) framework. The CRUISE-NIR algorithm shifts spectral analysis from a “point” to a “region” perspective, fully considering the physical correlation of adjacent wavelengths and chemical prior knowledge, reducing 4443 original variables to 16 key features. Meanwhile, the RAEL framework dynamically adjusts prediction weights according to sample performance characteristics in different spectral regions, achieving sample-specific precision prediction. Experimental results demonstrate that the proposed method achieves a root mean square error (RMSE) of 0.2795 and a coefficient of determination (R2) of 0.9646 on the test set, significantly outperforming traditional methods in prediction accuracy and fitting capability.Furthermore, the robustness of the framework was successfully validated on heterogeneous matrices including SWRI Diesel, IDRC Tablets, and Soil, demonstrating robust generalizability across diverse liquid and solid physical states. Experimental results indicate that prioritizing high-quality feature selection over variable quantity significantly enhances model performance. The proposed systematic framework demonstrates robust analytical capabilities for high-dimensional spectral data across diverse and complex molecular systems.
本研究针对近红外光谱预测汽油烯烃含量的高维共线性和区域信息异质性的挑战,提出了一种结合连续区域利用近红外综合光谱评估(CRUISE-NIR)算法和区域敏感自适应集成学习(RAEL)框架的系统优化方法。CRUISE-NIR算法将光谱分析从“点”的角度转移到“区域”的角度,充分考虑相邻波长的物理相关性和化学先验知识,将4443个原始变量减少到16个关键特征。同时,根据样本在不同光谱区域的性能特征动态调整预测权重,实现样本特定精度预测。实验结果表明,该方法在测试集上的均方根误差(RMSE)为0.2795,决定系数(R2)为0.9646,在预测精度和拟合能力上显著优于传统方法。此外,该框架的稳健性在包括SWRI Diesel、IDRC药片和土壤在内的异质基质上得到了成功验证,证明了该框架在不同液体和固体物理状态下的稳健性。实验结果表明,将高质量的特征选择优先于可变数量的特征选择可以显著提高模型的性能。提出的系统框架展示了跨不同和复杂分子系统的高维光谱数据的强大分析能力。
{"title":"Near-infrared spectroscopic prediction of gasoline olefin content: A systematic approach using continuous region feature selection and region-sensitive ensemble learning","authors":"Jiaxue Cui ,&nbsp;Dawei Zhang ,&nbsp;Banglian Xu ,&nbsp;Jianzhong Fan ,&nbsp;Xianglong Cao","doi":"10.1016/j.chemolab.2026.105661","DOIUrl":"10.1016/j.chemolab.2026.105661","url":null,"abstract":"<div><div>This study addresses the challenges of high-dimensional collinearity and regional information heterogeneity in near-infrared spectroscopy for gasoline olefin content prediction by proposing a systematic optimization approach combining a Continuous Region Utilizing Integrated Spectral Evaluation for Near-Infrared (CRUISE-NIR) algorithm with a Region-Sensitive Adaptive Ensemble Learning (RAEL) framework. The CRUISE-NIR algorithm shifts spectral analysis from a “point” to a “region” perspective, fully considering the physical correlation of adjacent wavelengths and chemical prior knowledge, reducing 4443 original variables to 16 key features. Meanwhile, the RAEL framework dynamically adjusts prediction weights according to sample performance characteristics in different spectral regions, achieving sample-specific precision prediction. Experimental results demonstrate that the proposed method achieves a root mean square error (RMSE) of 0.2795 and a coefficient of determination (R<sup>2</sup>) of 0.9646 on the test set, significantly outperforming traditional methods in prediction accuracy and fitting capability.Furthermore, the robustness of the framework was successfully validated on heterogeneous matrices including SWRI Diesel, IDRC Tablets, and Soil, demonstrating robust generalizability across diverse liquid and solid physical states. Experimental results indicate that prioritizing high-quality feature selection over variable quantity significantly enhances model performance. The proposed systematic framework demonstrates robust analytical capabilities for high-dimensional spectral data across diverse and complex molecular systems.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"271 ","pages":"Article 105661"},"PeriodicalIF":3.8,"publicationDate":"2026-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of consolidation behavior of modified clayey soil reinforced with artificial geo-fibers using explainable artificial intelligence 人工土工纤维加固改性粘土固结行为的可解释人工智能预测
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-04-15 Epub Date: 2026-02-04 DOI: 10.1016/j.chemolab.2026.105654
Mohammed Faisal Noaman , Moinul Haq , Sanjog Chhetri Sapkota , Mehboob Anwer Khan , Kausar Ali , Hesam Kamyab
The present study illustrates an experimental, machine learning (ML), and explainable artificial intelligence integrated framework for the prediction of swelling pressure and consolidation characteristics of polypropylene geo-fiber (PPGF) reinforced clayey soil. A dataset of laboratory consolidation tests that included PPGF content, coefficient of consolidation (Cv), coefficient of compressibility (av), compression index (Cc), coefficient of volume change (mv), settlement (S), and swelling pressure values (ps) was compiled. The experimental observations revealed that the Cc, mv, and S was averagely decreased by about 39.5%, 45.31%, and 90%, respectively, at the optimum PPGF content of 0.3%, thus demonstrating the effectiveness of reinforcing fibers in restraining time-dependent deformation. Six machine learning models, including KNN, SVM, ANN, DT, RF, and XGB, were developed using five folds cross-validation. The XGB regressor proved to have the best predictive performances, having an R2 of 0.994 (with RMSE of 3.14) on training and generalizability in testing, with an R2 of 0.913 (having RMSE of 14.05). The remaining models demonstrated comparatively weaker performance, with ANN and DT exhibiting pronounced overfitting, while KNN and SVM failed to adequately capture the nonlinear swelling response of the gels. The XAI analysis using SHAP indicates that polypropylene geofiber content is the most influential factor governing swelling pressure, followed by mv and soil compressibility. An interactive graphical user interface was built based on the optimized XGB model to predict and visualize swelling pressure in real time from given user inputs. The proposed model integrates experimental validation with robust predictive capability and interpretability, and is complemented by a user-friendly interface and a reliable decision-support system for geotechnical design and soil improvement.
本研究阐述了一个实验、机器学习(ML)和可解释的人工智能集成框架,用于预测聚丙烯土工纤维(PPGF)增强粘土的膨胀压力和固结特性。编制了实验室固结试验数据集,包括PPGF含量、固结系数(Cv)、压缩系数(av)、压缩指数(Cc)、体积变化系数(mv)、沉降(S)和膨胀压力值(ps)。实验结果表明,当PPGF的最佳含量为0.3%时,Cc、mv和S分别平均降低了39.5%、45.31%和90%,证明了增强纤维对时间相关变形的抑制作用。通过五重交叉验证,建立了KNN、SVM、ANN、DT、RF和XGB等6个机器学习模型。XGB回归因子具有最佳的预测性能,在训练和检验中具有0.994的R2 (RMSE为3.14),R2为0.913 (RMSE为14.05)。其余模型表现出相对较弱的性能,ANN和DT表现出明显的过拟合,而KNN和SVM未能充分捕捉凝胶的非线性膨胀响应。利用SHAP进行的XAI分析表明,聚丙烯土工纤维含量是影响膨胀压力最大的因素,其次是mv和土壤压缩率。基于优化后的XGB模型,构建了一个交互式图形用户界面,根据给定的用户输入实时预测和可视化膨胀压力。该模型集实验验证、强大的预测能力和可解释性于一体,并辅以用户友好的界面和可靠的岩土设计和土壤改良决策支持系统。
{"title":"Prediction of consolidation behavior of modified clayey soil reinforced with artificial geo-fibers using explainable artificial intelligence","authors":"Mohammed Faisal Noaman ,&nbsp;Moinul Haq ,&nbsp;Sanjog Chhetri Sapkota ,&nbsp;Mehboob Anwer Khan ,&nbsp;Kausar Ali ,&nbsp;Hesam Kamyab","doi":"10.1016/j.chemolab.2026.105654","DOIUrl":"10.1016/j.chemolab.2026.105654","url":null,"abstract":"<div><div>The present study illustrates an experimental, machine learning (ML), and explainable artificial intelligence integrated framework for the prediction of swelling pressure and consolidation characteristics of polypropylene geo-fiber (<em>PPGF</em>) reinforced clayey soil. A dataset of laboratory consolidation tests that included PPGF content, coefficient of consolidation (<em>C</em><sub><em>v</em></sub>), coefficient of compressibility (<em>a</em><sub><em>v</em></sub>), compression index (<em>C</em><sub><em>c</em></sub>), coefficient of volume change (<em>m</em><sub><em>v</em></sub>), settlement (<em>S</em>), and swelling pressure values (<em>p</em><sub><em>s</em></sub>) was compiled. The experimental observations revealed that the <em>C</em><sub><em>c</em></sub>, <em>m</em><sub><em>v</em></sub>, and <em>S</em> was averagely decreased by about 39.5%, 45.31%, and 90%, respectively, at the optimum PPGF content of 0.3%, thus demonstrating the effectiveness of reinforcing fibers in restraining time-dependent deformation. Six machine learning models, including KNN, SVM, ANN, DT, RF, and XGB, were developed using five folds cross-validation. The XGB regressor proved to have the best predictive performances, having an R<sup>2</sup> of 0.994 (with RMSE of 3.14) on training and generalizability in testing, with an R<sup>2</sup> of 0.913 (having RMSE of 14.05). The remaining models demonstrated comparatively weaker performance, with ANN and DT exhibiting pronounced overfitting, while KNN and SVM failed to adequately capture the nonlinear swelling response of the gels. The XAI analysis using SHAP indicates that polypropylene geofiber content is the most influential factor governing swelling pressure, followed by <em>m</em><sub><em>v</em></sub> and soil compressibility. An interactive graphical user interface was built based on the optimized XGB model to predict and visualize swelling pressure in real time from given user inputs. The proposed model integrates experimental validation with robust predictive capability and interpretability, and is complemented by a user-friendly interface and a reliable decision-support system for geotechnical design and soil improvement.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"271 ","pages":"Article 105654"},"PeriodicalIF":3.8,"publicationDate":"2026-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A directional multi-LSTM framework integrated BERT for S-sulfhydration sites prediction 基于BERT的定向多lstm框架用于s -巯基化位点预测
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-04-15 Epub Date: 2026-01-30 DOI: 10.1016/j.chemolab.2026.105653
Zhanchang Zhang , Qiao Ning , Xulun Shi , Shikai Guo , Hui Li
Protein S-sulfhydration is an important post-translational modification that regulates signaling pathways in animal cells by influencing protein activity and function. It also plays a crucial role in regulating plant metabolism and morphogenesis. Therefore, the identification of S-sulfhydration sites is crucial for cellular biology research. In this study, we propose a deep learning framework with directional multi-LSTM (Long Short-Term Memory) for predicting protein S-sulfhydration sites. In this study, we propose a deep learning framework utilizing a directional multi-LSTM (Long Short-Term Memory) network to predict protein S-sulfhydration sites. Initially, protein sequence data is preprocessed via an improved BERT strategy to extract high-dimensional sequence features. Hypothesizing that S-sulfhydration modification exhibits directionality, we partition sequences around cysteine residues and extract features using directional multi-LSTM, simulating the enzymatic reaction conditions. Subsequently, a convolutional neural network (CNN) is employed to capture deep local information features. On an independent test set, the accuracy, sensitivity, specificity, Matthews correlation coefficient, area under the curve, and precision are 76.76%, 85.45%, 67.21%, 53.77%, 76.33% and 74.11% respectively. The results demonstrate that the multi-directional LSTM deep learning framework is an effective tool for predicting protein S-sulfhydration. The source code is available on the website https://github.com/endeavor-zzc/Multi-LSTM.
蛋白质s -巯基化是一种重要的翻译后修饰,通过影响蛋白质活性和功能来调节动物细胞中的信号通路。它在调节植物代谢和形态发生中也起着至关重要的作用。因此,s -巯基化位点的鉴定对细胞生物学研究至关重要。在这项研究中,我们提出了一个具有定向多lstm(长短期记忆)的深度学习框架来预测蛋白质s -巯基化位点。在这项研究中,我们提出了一个利用定向多lstm(长短期记忆)网络来预测蛋白质s -巯基化位点的深度学习框架。首先,通过改进的BERT策略对蛋白质序列数据进行预处理,提取高维序列特征。假设s -硫水化修饰具有方向性,我们在半胱氨酸残基周围划分序列,并使用定向多lstm提取特征,模拟酶促反应条件。随后,采用卷积神经网络(CNN)捕获深度局部信息特征。在独立测试集上,准确度、灵敏度、特异度、马修斯相关系数、曲线下面积和精密度分别为76.76%、85.45%、67.21%、53.77%、76.33%和74.11%。结果表明,多向LSTM深度学习框架是预测蛋白质s -巯基化的有效工具。源代码可在网站https://github.com/endeavor-zzc/Multi-LSTM上获得。
{"title":"A directional multi-LSTM framework integrated BERT for S-sulfhydration sites prediction","authors":"Zhanchang Zhang ,&nbsp;Qiao Ning ,&nbsp;Xulun Shi ,&nbsp;Shikai Guo ,&nbsp;Hui Li","doi":"10.1016/j.chemolab.2026.105653","DOIUrl":"10.1016/j.chemolab.2026.105653","url":null,"abstract":"<div><div>Protein S-sulfhydration is an important post-translational modification that regulates signaling pathways in animal cells by influencing protein activity and function. It also plays a crucial role in regulating plant metabolism and morphogenesis. Therefore, the identification of S-sulfhydration sites is crucial for cellular biology research. In this study, we propose a deep learning framework with directional multi-LSTM (Long Short-Term Memory) for predicting protein S-sulfhydration sites. In this study, we propose a deep learning framework utilizing a directional multi-LSTM (Long Short-Term Memory) network to predict protein S-sulfhydration sites. Initially, protein sequence data is preprocessed via an improved BERT strategy to extract high-dimensional sequence features. Hypothesizing that S-sulfhydration modification exhibits directionality, we partition sequences around cysteine residues and extract features using directional multi-LSTM, simulating the enzymatic reaction conditions. Subsequently, a convolutional neural network (CNN) is employed to capture deep local information features. On an independent test set, the accuracy, sensitivity, specificity, Matthews correlation coefficient, area under the curve, and precision are 76.76%, 85.45%, 67.21%, 53.77%, 76.33% and 74.11% respectively. The results demonstrate that the multi-directional LSTM deep learning framework is an effective tool for predicting protein S-sulfhydration. The source code is available on the website <span><span>https://github.com/endeavor-zzc/Multi-LSTM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"271 ","pages":"Article 105653"},"PeriodicalIF":3.8,"publicationDate":"2026-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Solutions reproducibility in multiresponse optimization problems: A new desirability-based objective function 多响应优化问题解的可重复性:一种新的基于期望的目标函数
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-04-15 Epub Date: 2026-01-28 DOI: 10.1016/j.chemolab.2026.105650
Nuno Costa , João Lourenço
Theoretical solutions for multiresponse problems may not yield the expected results when implemented in practice at the process and/or product level. Causes that have been overlooked and lead to such discrepancies in problems developed under the Response Surface Methodology framework are the magnitude of prediction errors in some regions of the solution space, unreplicated experimental runs, and responses' sensitivity to changes in the values of response model variables. That discrepancy value must be minimized and can be managed in the generation of optimal solutions. Therefore, to improve the reproducibility of theoretical solutions, a new desirability-based function is proposed. This objective function allows to balance the response's bias, predictions quality, robustness, and resilience according to the decision maker's preferences. Two case studies demonstrate its flexibility and usefulness.
多响应问题的理论解决方案在过程和/或产品层面实施时可能无法产生预期的结果。在响应面方法框架下开发的问题中,导致这种差异的原因被忽视了,包括解决空间某些区域的预测误差的大小、未重复的实验运行以及响应对响应模型变量值变化的敏感性。该差异值必须最小化,并且可以在生成最优解时加以管理。因此,为了提高理论解的可重复性,提出了一种新的基于期望度的函数。这个目标函数可以根据决策者的偏好来平衡反应的偏差、预测质量、稳健性和弹性。两个案例研究证明了它的灵活性和实用性。
{"title":"Solutions reproducibility in multiresponse optimization problems: A new desirability-based objective function","authors":"Nuno Costa ,&nbsp;João Lourenço","doi":"10.1016/j.chemolab.2026.105650","DOIUrl":"10.1016/j.chemolab.2026.105650","url":null,"abstract":"<div><div>Theoretical solutions for multiresponse problems may not yield the expected results when implemented in practice at the process and/or product level. Causes that have been overlooked and lead to such discrepancies in problems developed under the Response Surface Methodology framework are the magnitude of prediction errors in some regions of the solution space, unreplicated experimental runs, and responses' sensitivity to changes in the values of response model variables. That discrepancy value must be minimized and can be managed in the generation of optimal solutions. Therefore, to improve the reproducibility of theoretical solutions, a new desirability-based function is proposed. This objective function allows to balance the response's bias, predictions quality, robustness, and resilience according to the decision maker's preferences. Two case studies demonstrate its flexibility and usefulness.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"271 ","pages":"Article 105650"},"PeriodicalIF":3.8,"publicationDate":"2026-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146186945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Non-invasive diagnosis of common glomerular diseases via Raman spectroscopy and machine learning: an integrated blood and urine analysis approach 通过拉曼光谱和机器学习无创诊断常见肾小球疾病:一种综合血液和尿液分析方法
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-03-15 Epub Date: 2026-01-21 DOI: 10.1016/j.chemolab.2026.105630
Mengyu Wu , Yuan Cao , Ruiyang Wang , Chongxuan Tian , Yang Li , Zunsong Wang

Background

Percutaneous renal biopsy faces three major challenges in clinical management: inherent procedural risks, inability to serially monitor disease activity, and sampling variability. These limitations underscore the demand for safer, repeatable diagnostic tools.

Objective

Our objective was to explore the potential of a liquid biopsy strategy utilizing paired blood and urine analysis via Raman spectroscopy and a 1D-CNN to facilitate the differentiation of common glomerular diseases from each other and from healthy individuals.

Methods

From January 2021 to January 2025, we collected serum and first-void morning urine from 170 biopsy-confirmed patients (81 membranous nephropathy, 36 IgA nephropathy, 33 diabetic nephropathy, 20 focal segmental glomerulosclerosis) and 21 healthy volunteers. Spectra were acquired on an Attenuated Total Reflection-8300 (ATR-8300) instrument (785 nm excitation) and preprocessed via third-order polynomial baseline correction and 13-point Savitzky–Golay smoothing. A 1D-CNN was trained on the combined spectral data; performance was assessed by accuracy, sensitivity, specificity, and Receiver Operating Characteristic - Area Under the Curve (ROC-AUC).

Results

The 1D-CNN model achieved 80.0 % accuracy, 76.2 % sensitivity, and 81.3 % specificity in five-class classification. ROC-AUCs ranged from 0.81 (FSGS) to 0.85 (IgA nephropathy), confirming robust discrimination across disease subtypes and controls. Characteristic Raman bands—e.g. phenylalanine (∼1003 cm−1), Amide I (∼1655 cm−1), and C–H stretching (2800–3000 cm−1)—differed systematically among cohorts, reflecting underlying biochemical alterations.

Conclusions

Raman spectroscopy of paired blood and urine, coupled with deep learning, provides a rapid, label-free approach for minimally invasive classification of glomerular diseases. This integrated liquid biopsy strategy may enable early detection and precise stratification in nephrology, reducing reliance on invasive biopsy and informing personalized therapy.
背景:经皮肾活检在临床管理中面临三大挑战:固有的程序风险、无法连续监测疾病活动以及采样的可变性。这些限制强调了对更安全、可重复的诊断工具的需求。我们的目的是探索液体活检策略的潜力,利用拉曼光谱和1D-CNN对血液和尿液进行配对分析,以促进常见肾小球疾病彼此之间和健康个体之间的区分。方法从2021年1月至2025年1月,收集170例活检确诊患者(膜性肾病81例,IgA肾病36例,糖尿病肾病33例,局灶节段性肾小球硬化20例)和21名健康志愿者的血清和首次空晨尿。在衰减全反射-8300 (ATR-8300)仪器(785 nm激发)上获取光谱,并通过三阶多项式基线校正和13点Savitzky-Golay平滑进行预处理。在组合光谱数据上训练1D-CNN;通过准确性、灵敏度、特异性和受试者工作特征-曲线下面积(ROC-AUC)来评估疗效。结果1D-CNN模型在五类分类中准确率为80.0%,灵敏度为76.2%,特异度为81.3%。roc - auc范围从0.81 (FSGS)到0.85 (IgA肾病),证实了疾病亚型和对照之间的强大区别。特征拉曼波段:苯丙氨酸(~ 1003 cm−1)、酰胺I (~ 1655 cm−1)和C-H拉伸(2800-3000 cm−1)在队列中存在系统性差异,反映了潜在的生化改变。结论配对血液和尿液的拉曼光谱,结合深度学习,为肾小球疾病的微创分类提供了一种快速、无标记的方法。这种综合液体活检策略可以实现肾脏学的早期检测和精确分层,减少对侵入性活检的依赖,并为个性化治疗提供信息。
{"title":"Non-invasive diagnosis of common glomerular diseases via Raman spectroscopy and machine learning: an integrated blood and urine analysis approach","authors":"Mengyu Wu ,&nbsp;Yuan Cao ,&nbsp;Ruiyang Wang ,&nbsp;Chongxuan Tian ,&nbsp;Yang Li ,&nbsp;Zunsong Wang","doi":"10.1016/j.chemolab.2026.105630","DOIUrl":"10.1016/j.chemolab.2026.105630","url":null,"abstract":"<div><h3>Background</h3><div>Percutaneous renal biopsy faces three major challenges in clinical management: inherent procedural risks, inability to serially monitor disease activity, and sampling variability. These limitations underscore the demand for safer, repeatable diagnostic tools.</div></div><div><h3>Objective</h3><div>Our objective was to explore the potential of a liquid biopsy strategy utilizing paired blood and urine analysis via Raman spectroscopy and a 1D-CNN to facilitate the differentiation of common glomerular diseases from each other and from healthy individuals.</div></div><div><h3>Methods</h3><div>From January 2021 to January 2025, we collected serum and first-void morning urine from 170 biopsy-confirmed patients (81 membranous nephropathy, 36 IgA nephropathy, 33 diabetic nephropathy, 20 focal segmental glomerulosclerosis) and 21 healthy volunteers. Spectra were acquired on an Attenuated Total Reflection-8300 (ATR-8300) instrument (785 nm excitation) and preprocessed via third-order polynomial baseline correction and 13-point Savitzky–Golay smoothing. A 1D-CNN was trained on the combined spectral data; performance was assessed by accuracy, sensitivity, specificity, and Receiver Operating Characteristic - Area Under the Curve (ROC-AUC).</div></div><div><h3>Results</h3><div>The 1D-CNN model achieved 80.0 % accuracy, 76.2 % sensitivity, and 81.3 % specificity in five-class classification. ROC-AUCs ranged from 0.81 (FSGS) to 0.85 (IgA nephropathy), confirming robust discrimination across disease subtypes and controls. Characteristic Raman bands—e.g. phenylalanine (∼1003 cm<sup>−1</sup>), Amide I (∼1655 cm<sup>−1</sup>), and C–H stretching (2800–3000 cm<sup>−1</sup>)—differed systematically among cohorts, reflecting underlying biochemical alterations.</div></div><div><h3>Conclusions</h3><div>Raman spectroscopy of paired blood and urine, coupled with deep learning, provides a rapid, label-free approach for minimally invasive classification of glomerular diseases. This integrated liquid biopsy strategy may enable early detection and precise stratification in nephrology, reducing reliance on invasive biopsy and informing personalized therapy.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"270 ","pages":"Article 105630"},"PeriodicalIF":3.8,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146075362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Chemometrics and Intelligent Laboratory Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1