首页 > 最新文献

Chemometrics and Intelligent Laboratory Systems最新文献

英文 中文
Eyes on every node: Adaptive neighborhood perception for spatiotemporal data intelligent modeling and its industrial application 着眼每一个节点:时空数据智能建模的自适应邻域感知及其工业应用
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-03-15 Epub Date: 2026-01-13 DOI: 10.1016/j.chemolab.2026.105633
Yalin Wang , Ruikai Yang , Chenliang Liu , Zhongmei Li , Yijing Fang , Weihua Gui
Accurate online prediction of key quality variables is a guiding indicator for process optimization and stable operation in industrial processes. Due to the continuous and occasionally abrupt nature of industrial processes, industrial data often exhibit complex spatiotemporal coupling characteristics across long-range spatial and adjacent temporal dimensions. In particular, the dynamic variation of local spatiotemporal neighborhood space makes it challenging for traditional methods to capture these patterns. To address this issue, this paper proposes a novel neighborhood attention-aware spatiotemporal manifold autoencoder (NA-STMAE) model for soft sensor modeling of quality variables, which is designed to learn adaptive correlations within spatial and temporal neighborhoods of industrial data. Specifically, a novel attention-based neighborhood computing mode is designed to dynamically allocate weights among local samples, enabling adaptive perception and refinement of neighborhood relationships. Based on this, an attention-aware spatiotemporal neighborhood feature extraction module is developed to learn local spatiotemporal dependencies, thereby enhancing the predictive performance of the proposed soft sensor model. Finally, extensive experiments were conducted on two industrial processes to validate the effectiveness of the proposed model. Experimental results demonstrate that the proposed model outperforms several mainstream soft sensor models in prediction tasks. Moreover, ablation experiments further confirm the critical role of dynamic weight allocation in capturing both temporal and spatial dimensions.
关键质量变量的准确在线预测是工业过程优化和稳定运行的指导性指标。由于工业过程的连续性和偶尔的突发性,工业数据往往表现出复杂的时空耦合特征,跨越远距离空间和相邻时间维度。特别是局部时空邻域空间的动态变化使得传统方法难以捕捉这些模式。为了解决这一问题,本文提出了一种新的邻域注意力感知时空流形自编码器(NA-STMAE)模型,用于质量变量的软传感器建模,该模型旨在学习工业数据的时空邻域内的自适应相关性。具体而言,设计了一种新的基于注意力的邻域计算模式,在局部样本之间动态分配权重,实现邻域关系的自适应感知和细化。在此基础上,开发了注意感知时空邻域特征提取模块,学习局部时空依赖关系,从而提高了软传感器模型的预测性能。最后,在两个工业过程中进行了大量的实验来验证所提出模型的有效性。实验结果表明,该模型在预测任务方面优于几种主流软测量模型。此外,消融实验进一步证实了动态权重分配在捕获时间和空间维度方面的关键作用。
{"title":"Eyes on every node: Adaptive neighborhood perception for spatiotemporal data intelligent modeling and its industrial application","authors":"Yalin Wang ,&nbsp;Ruikai Yang ,&nbsp;Chenliang Liu ,&nbsp;Zhongmei Li ,&nbsp;Yijing Fang ,&nbsp;Weihua Gui","doi":"10.1016/j.chemolab.2026.105633","DOIUrl":"10.1016/j.chemolab.2026.105633","url":null,"abstract":"<div><div>Accurate online prediction of key quality variables is a guiding indicator for process optimization and stable operation in industrial processes. Due to the continuous and occasionally abrupt nature of industrial processes, industrial data often exhibit complex spatiotemporal coupling characteristics across long-range spatial and adjacent temporal dimensions. In particular, the dynamic variation of local spatiotemporal neighborhood space makes it challenging for traditional methods to capture these patterns. To address this issue, this paper proposes a novel neighborhood attention-aware spatiotemporal manifold autoencoder (NA-STMAE) model for soft sensor modeling of quality variables, which is designed to learn adaptive correlations within spatial and temporal neighborhoods of industrial data. Specifically, a novel attention-based neighborhood computing mode is designed to dynamically allocate weights among local samples, enabling adaptive perception and refinement of neighborhood relationships. Based on this, an attention-aware spatiotemporal neighborhood feature extraction module is developed to learn local spatiotemporal dependencies, thereby enhancing the predictive performance of the proposed soft sensor model. Finally, extensive experiments were conducted on two industrial processes to validate the effectiveness of the proposed model. Experimental results demonstrate that the proposed model outperforms several mainstream soft sensor models in prediction tasks. Moreover, ablation experiments further confirm the critical role of dynamic weight allocation in capturing both temporal and spatial dimensions.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"270 ","pages":"Article 105633"},"PeriodicalIF":3.8,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A sustainable and straightforward approach for the authentication of roasted coffee samples based on absorption spectrophotometry coupled with Data Driven-Soft Independent Modelling of Class Analogy 基于吸收分光光度法和类类比的数据驱动-软独立建模的咖啡烘焙样品认证方法
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-03-15 Epub Date: 2026-02-04 DOI: 10.1016/j.chemolab.2026.105658
Federico N. Castañeda, Clara Parzanese, Mario R. Reta, Cecilia B. Castells, Juan Aspromonte, Rocío B. Pellegrino Vidal
Coffee is one of the world's most consumed beverages and the second most traded commodity. Natural roasted coffee, produced by heating green beans to develop its characteristic flavor, is highly appreciated. A variant, torrefacto coffee, incorporates sugar during roasting. While legitimate, this practice can be used to mask off-flavors from lower-quality beans and artificially increase weight, sometimes leading to mislabeling. This study proposes a simple, robust method to authenticate roasted coffee using UV absorbance spectroscopy and a one-class modeling algorithm (Data Driven-Soft Independent Modelling of Class Analogies). Samples of natural, torrefacto, and in-lab adulterated coffee (10%, 25%, and 50%), underwent a water extraction and were diluted for absorbance measurements (200 to 400 nm). A discriminant model was built using the first four principal components, which explained 99.7% of the spectral variance. Trained on 80% of the natural coffee samples, the model was validated on a test set containing the remaining natural, torrefacto, and adulterated samples. The method proved highly effective, detecting adulteration levels as low as 10%. It achieved 100% sensitivity, 97% specificity, and 97% overall accuracy. A White Analytical Chemistry assessment yielded an 86.2% whiteness score, confirming a strong balance between sustainability and analytical performance.
咖啡是世界上消费最多的饮料之一,也是第二大贸易商品。天然烘焙咖啡,通过加热四季豆来发展其特有的风味,受到高度赞赏。另一种变体torrefacto咖啡在烘焙过程中加入了糖。虽然合法,但这种做法可以用来掩盖低质量咖啡豆的异味,人为地增加重量,有时会导致贴错标签。本研究提出了一种简单、稳健的方法,利用紫外吸收光谱和一类建模算法(数据驱动-类类比的软独立建模)来验证烘焙咖啡。天然咖啡、人造咖啡和实验室掺假咖啡(10%、25%和50%)的样品进行水提取并稀释吸光度(200至400 nm)。利用前4个主成分建立了判别模型,解释了99.7%的光谱方差。该模型在80%的天然咖啡样本上进行了训练,并在包含剩余的天然、伪造和掺假样本的测试集上进行了验证。该方法被证明是非常有效的,可以检测到低至10%的掺假水平。它达到100%的灵敏度,97%的特异性和97%的总体准确性。白色分析化学评估得出86.2%的白度得分,确认了可持续性和分析性能之间的强大平衡。
{"title":"A sustainable and straightforward approach for the authentication of roasted coffee samples based on absorption spectrophotometry coupled with Data Driven-Soft Independent Modelling of Class Analogy","authors":"Federico N. Castañeda,&nbsp;Clara Parzanese,&nbsp;Mario R. Reta,&nbsp;Cecilia B. Castells,&nbsp;Juan Aspromonte,&nbsp;Rocío B. Pellegrino Vidal","doi":"10.1016/j.chemolab.2026.105658","DOIUrl":"10.1016/j.chemolab.2026.105658","url":null,"abstract":"<div><div>Coffee is one of the world's most consumed beverages and the second most traded commodity. Natural roasted coffee, produced by heating green beans to develop its characteristic flavor, is highly appreciated. A variant, torrefacto coffee, incorporates sugar during roasting. While legitimate, this practice can be used to mask off-flavors from lower-quality beans and artificially increase weight, sometimes leading to mislabeling. This study proposes a simple, robust method to authenticate roasted coffee using UV absorbance spectroscopy and a one-class modeling algorithm (Data Driven-Soft Independent Modelling of Class Analogies). Samples of natural, torrefacto, and in-lab adulterated coffee (10%, 25%, and 50%), underwent a water extraction and were diluted for absorbance measurements (200 to 400 nm). A discriminant model was built using the first four principal components, which explained 99.7% of the spectral variance. Trained on 80% of the natural coffee samples, the model was validated on a test set containing the remaining natural, torrefacto, and adulterated samples. The method proved highly effective, detecting adulteration levels as low as 10%. It achieved 100% sensitivity, 97% specificity, and 97% overall accuracy. A White Analytical Chemistry assessment yielded an 86.2% whiteness score, confirming a strong balance between sustainability and analytical performance.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"270 ","pages":"Article 105658"},"PeriodicalIF":3.8,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146169972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust integrated chemometric driven approach for the analysis of cilnidipine and chlorthalidone in biological and pharmaceutical matrices 生物和药物基质中西尼地平和氯噻酮的鲁棒综合化学计量驱动分析方法
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-03-15 Epub Date: 2026-01-13 DOI: 10.1016/j.chemolab.2026.105637
Suraj R. Chaudhari , Atul A. Shirkhedkar
Cilnidipine (CIL) and chlorthalidone (CHL), both antihypertensive agents are approved for combined regimens for the management of hypertension. A thorough evaluation of their intrinsic stability and content levels in commercially available preparations and biological samples requires a simple and reliable analytical approach. Herein, computational approaches, stability investigations, and retrospective analysis of content uniformity results were explored to support the robustness and long-term suitability of the proposed protocol for routine application. Therefore, this experiment established an ultra-fluid liquid chromatography with diode array detection (UFLC-PDA) for simultaneous separation and quantification of CIL and CHL in Cilacar C, Nexovas CH tablets, and biological matrices. The applicability of the established protocol was confirmed with the ICH Q2 (R2), Q1A (R2), and Q1B recommendations. Analytes were extracted in a simple single step and analyzed using a rapid resolution ZORBAX Eclipse C18 column (4.6 mm internal diameter × 100 mm length with 3.5 μm particle size), maintained at 33 °C as column oven temperature. The resolution was observed using binary gradient elution at 0.5 mL/min with a solvent system comprising H2O: ACN (25.85:74.15 % v/v). CHL and CIL were detected at a retention time (tR) of 2.221 ± 0.003 min and 4.435 ± 0.011 min, with a total run time <8.0 min. The proposed protocol demonstrates outstanding specificity and sensitivity, offering a systematic platform for developing and refining knowledge related to a UFLC-PDA procedure. Moreover, it shows a comprehensive understanding of the procedure to meet the requirements specified in ICH Q14.
西尼地平(CIL)和氯噻酮(CHL)这两种抗高血压药物被批准用于高血压治疗的联合治疗方案。全面评估它们在市售制剂和生物样品中的内在稳定性和含量水平需要一种简单可靠的分析方法。本文探讨了计算方法、稳定性调查和内容均匀性结果的回顾性分析,以支持拟议方案在常规应用中的鲁棒性和长期适用性。因此,本实验建立了二极管阵列检测超流体液相色谱(UFLC-PDA)同时分离定量Cilacar C、Nexovas CH片剂和生物基质中CIL和CHL的方法。既定方案的适用性根据ICH Q2 (R2)、Q1A (R2)和Q1B建议得到确认。色谱柱为ZORBAX Eclipse C18柱(4.6 mm内径× 100 mm长,3.5 μm粒度),柱箱温度为33℃。以H2O: ACN (25.85: 74.15% v/v)为溶剂体系,以0.5 mL/min的速度进行二元梯度洗脱。CHL和CIL的滞留时间(tR)分别为2.221±0.003 min和4.435±0.011 min,总运行时间为8.0 min。该方案具有突出的特异性和敏感性,为开发和完善与UFLC-PDA程序相关的知识提供了系统的平台。此外,它显示了对程序的全面理解,以满足ICH Q14规定的要求。
{"title":"Robust integrated chemometric driven approach for the analysis of cilnidipine and chlorthalidone in biological and pharmaceutical matrices","authors":"Suraj R. Chaudhari ,&nbsp;Atul A. Shirkhedkar","doi":"10.1016/j.chemolab.2026.105637","DOIUrl":"10.1016/j.chemolab.2026.105637","url":null,"abstract":"<div><div>Cilnidipine (CIL) and chlorthalidone (CHL), both antihypertensive agents are approved for combined regimens for the management of hypertension. A thorough evaluation of their intrinsic stability and content levels in commercially available preparations and biological samples requires a simple and reliable analytical approach. Herein, computational approaches, stability investigations, and retrospective analysis of content uniformity results were explored to support the robustness and long-term suitability of the proposed protocol for routine application. Therefore, this experiment established an ultra-fluid liquid chromatography with diode array detection (UFLC-PDA) for simultaneous separation and quantification of CIL and CHL in Cilacar C, Nexovas CH tablets, and biological matrices. The applicability of the established protocol was confirmed with the ICH Q2 (R2), Q1A (R2), and Q1B recommendations. Analytes were extracted in a simple single step and analyzed using a rapid resolution ZORBAX Eclipse C<sub>18</sub> column (4.6 mm internal diameter <span><math><mrow><mo>×</mo></mrow></math></span> 100 mm length with 3.5 μm particle size), maintained at 33 <span><math><mrow><mo>°C</mo></mrow></math></span> as column oven temperature. The resolution was observed using binary gradient elution at 0.5 mL/min with a solvent system comprising H<sub>2</sub>O: ACN (25.85:74.15 % <em>v/v</em>). CHL and CIL were detected at a retention time (t<sub>R</sub>) of 2.221 ± 0.003 min and 4.435 ± 0.011 min, with a total run time &lt;8.0 min. The proposed protocol demonstrates outstanding specificity and sensitivity, offering a systematic platform for developing and refining knowledge related to a UFLC-PDA procedure. Moreover, it shows a comprehensive understanding of the procedure to meet the requirements specified in ICH Q14.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"270 ","pages":"Article 105637"},"PeriodicalIF":3.8,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146024966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explainable AI for secure and accurate prediction of bacteriophage virion proteins using NLP descriptors and transformer-guided ideal proximity matrix reconstruction 使用NLP描述符和变压器引导的理想接近矩阵重建,用于安全准确预测噬菌体病毒粒子蛋白的可解释人工智能
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-03-15 Epub Date: 2026-01-23 DOI: 10.1016/j.chemolab.2026.105648
Naif Almusallam , Maqsood Hayat
The biological functions of bacteria are significantly impacted by bacteriophage virion proteins (BVPs), which are bacterial viruses. BVPs play a major role in phage therapy and genetic engineering. Secure and accurate identification of these proteins is essential for understanding phage-host interactions and for bioinformatics and medical applications. However, ensuring privacy and robustness in computational models is challenging, especially when handling complex biological data. Previous works relied on wet-lab experiments, had limited scalability, incomplete feature coverage, and low generalization ability. In this study, we introduce a privacy-preserving and adversarial-robust deep learning framework. It integrates natural language processing (NLP) descriptors with transformer-guided ideal proximity matrix reconstruction to capture rich information from protein sequences. For post-hoc interpretability, we use SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME). These techniques increase openness and confidence in predictions. SHAP analyzes the dataset to identify the most significant proximity-based and NLP-derived descriptors at global and class levels. LIME provides instance-specific explanations, emphasizing local decision boundaries for particular predictions. The proposed model achieved 95.75 % and 90.27 % accuracy on the training and independent datasets, respectively. We calculated statistical measures, such as Chi-Square and P-value, for each dataset to demonstrate reliability. Our model improves predictive outcomes, transparency, and security. The empirical results validate its outstanding performance compared to existing models, while preserving security and explainable AI. This makes it suitable and reliable for real-world applications in proteomics and bioinformatics.
噬菌体病毒蛋白(bacteriophage virion protein, BVPs)是细菌的一种病毒,它对细菌的生物学功能有重要影响。bvp在噬菌体治疗和基因工程中发挥着重要作用。安全和准确地鉴定这些蛋白质对于理解噬菌体-宿主相互作用以及生物信息学和医学应用至关重要。然而,确保计算模型的隐私性和鲁棒性是具有挑战性的,特别是在处理复杂的生物数据时。以前的工作依赖于湿实验室实验,可扩展性有限,特征覆盖不完整,泛化能力低。在本研究中,我们引入了一个隐私保护和对抗鲁棒的深度学习框架。它将自然语言处理(NLP)描述符与变压器引导的理想接近矩阵重构相结合,从蛋白质序列中捕获丰富的信息。对于事后可解释性,我们使用SHapley加性解释(SHAP)和局部可解释模型不可知论解释(LIME)。这些技术增加了预测的开放性和信心。SHAP分析数据集,以确定全局和类级别上最重要的基于接近性和nlp派生的描述符。LIME提供特定于实例的解释,强调特定预测的局部决策边界。该模型在训练数据集和独立数据集上的准确率分别达到95.75%和90.27%。我们为每个数据集计算了统计度量,如卡方和p值,以证明可靠性。我们的模型提高了预测结果、透明度和安全性。与现有模型相比,实证结果验证了其出色的性能,同时保留了安全性和可解释的AI。这使得它适用于蛋白质组学和生物信息学的实际应用。
{"title":"Explainable AI for secure and accurate prediction of bacteriophage virion proteins using NLP descriptors and transformer-guided ideal proximity matrix reconstruction","authors":"Naif Almusallam ,&nbsp;Maqsood Hayat","doi":"10.1016/j.chemolab.2026.105648","DOIUrl":"10.1016/j.chemolab.2026.105648","url":null,"abstract":"<div><div>The biological functions of bacteria are significantly impacted by bacteriophage virion proteins (BVPs), which are bacterial viruses. BVPs play a major role in phage therapy and genetic engineering. Secure and accurate identification of these proteins is essential for understanding phage-host interactions and for bioinformatics and medical applications. However, ensuring privacy and robustness in computational models is challenging, especially when handling complex biological data. Previous works relied on wet-lab experiments, had limited scalability, incomplete feature coverage, and low generalization ability. In this study, we introduce a privacy-preserving and adversarial-robust deep learning framework. It integrates natural language processing (NLP) descriptors with transformer-guided ideal proximity matrix reconstruction to capture rich information from protein sequences. For post-hoc interpretability, we use SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME). These techniques increase openness and confidence in predictions. SHAP analyzes the dataset to identify the most significant proximity-based and NLP-derived descriptors at global and class levels. LIME provides instance-specific explanations, emphasizing local decision boundaries for particular predictions. The proposed model achieved 95.75 % and 90.27 % accuracy on the training and independent datasets, respectively. We calculated statistical measures, such as Chi-Square and P-value, for each dataset to demonstrate reliability. Our model improves predictive outcomes, transparency, and security. The empirical results validate its outstanding performance compared to existing models, while preserving security and explainable AI. This makes it suitable and reliable for real-world applications in proteomics and bioinformatics.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"270 ","pages":"Article 105648"},"PeriodicalIF":3.8,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146074875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A mechanistic causality-guided robust dynamical probabilistic latent variable regression model and its application to soft sensing of continuous chemical processes 一种机械因果导向的稳健动态概率潜变量回归模型及其在连续化工过程软测量中的应用
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-03-15 Epub Date: 2026-01-13 DOI: 10.1016/j.chemolab.2026.105638
Wenxue Han , Ziteng Zuo , Xiangjing Zhang , Lan Zhang , Weiming Shao
Precisely predicting quality variables is crucial for advanced process control and real-time optimization in continuous chemical processes. Soft sensing technology is utilized for this task due to its advantages of real-time capability and low cost. Dynamical probabilistic latent variable regression (DPLVR) models for soft sensing modeling have attracted increasing attention, owing to their superior feature extraction capability. Nevertheless, the DPLVR-based soft sensing methods only account for variable correlations while neglecting the underlying causal mechanisms. Current research on causal methods primarily focuses on the selection of causal variables and the construction of causal graphs, failing to effectively integrate the causal priors that reflect the underlying mechanisms of chemical processes. In addition, outliers in chemical data further degrade the prediction accuracy of soft sensors, making them inadequate for practical production requirements. Given the above problems, a novel mechanistic causality-guided robust DPLVR (MCR-DPLVR) model is proposed for predicting the quality variables. In the MCR-DPLVR, the mechanistic causality knowledge is used to identify the causal mechanisms among different types of variables, and the Student’s t distribution is utilized to enhance the model’s robustness against outliers. Subsequently, an efficient semi-supervised training algorithm is developed to train the MCR-DPLVR based on the expectation–maximization algorithm. Furthermore, the effectiveness of the MCR-DPLVR is verified by a synthetic numerical case and an actual hydrogen production process, which exhibits the superiority of the MCR-DPLVR in comparison to several cutting-edge methods.
在连续化工过程中,精确预测质量变量对先进的过程控制和实时优化至关重要。软测量技术具有实时性好、成本低的优点,可用于该任务。动态概率潜变量回归(DPLVR)模型由于其优越的特征提取能力而越来越受到人们的关注。然而,基于dplvr的软测量方法只考虑变量相关性,而忽略了潜在的因果机制。目前对因果方法的研究主要集中在因果变量的选择和因果图的构建上,未能有效整合反映化学过程潜在机制的因果先验。此外,化学数据中的异常值进一步降低了软传感器的预测精度,使其无法满足实际生产要求。针对上述问题,提出了一种新的机制因果导向鲁棒DPLVR (MCR-DPLVR)模型来预测质量变量。在MCR-DPLVR中,利用机制因果关系知识来识别不同类型变量之间的因果机制,并利用Student 's t分布来增强模型对异常值的稳健性。随后,提出了一种基于期望最大化算法的高效半监督训练算法来训练MCR-DPLVR。最后,通过综合数值算例和实际制氢过程验证了MCR-DPLVR的有效性,表明了MCR-DPLVR与几种前沿制氢方法相比的优越性。
{"title":"A mechanistic causality-guided robust dynamical probabilistic latent variable regression model and its application to soft sensing of continuous chemical processes","authors":"Wenxue Han ,&nbsp;Ziteng Zuo ,&nbsp;Xiangjing Zhang ,&nbsp;Lan Zhang ,&nbsp;Weiming Shao","doi":"10.1016/j.chemolab.2026.105638","DOIUrl":"10.1016/j.chemolab.2026.105638","url":null,"abstract":"<div><div>Precisely predicting quality variables is crucial for advanced process control and real-time optimization in continuous chemical processes. Soft sensing technology is utilized for this task due to its advantages of real-time capability and low cost. Dynamical probabilistic latent variable regression (DPLVR) models for soft sensing modeling have attracted increasing attention, owing to their superior feature extraction capability. Nevertheless, the DPLVR-based soft sensing methods only account for variable correlations while neglecting the underlying causal mechanisms. Current research on causal methods primarily focuses on the selection of causal variables and the construction of causal graphs, failing to effectively integrate the causal priors that reflect the underlying mechanisms of chemical processes. In addition, outliers in chemical data further degrade the prediction accuracy of soft sensors, making them inadequate for practical production requirements. Given the above problems, a novel mechanistic causality-guided robust DPLVR (MCR-DPLVR) model is proposed for predicting the quality variables. In the MCR-DPLVR, the mechanistic causality knowledge is used to identify the causal mechanisms among different types of variables, and the Student’s <span><math><mi>t</mi></math></span> distribution is utilized to enhance the model’s robustness against outliers. Subsequently, an efficient semi-supervised training algorithm is developed to train the MCR-DPLVR based on the expectation–maximization algorithm. Furthermore, the effectiveness of the MCR-DPLVR is verified by a synthetic numerical case and an actual hydrogen production process, which exhibits the superiority of the MCR-DPLVR in comparison to several cutting-edge methods.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"270 ","pages":"Article 105638"},"PeriodicalIF":3.8,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Smoothed Power-Weakness Ratio (sPWR): a new informative system for multi-criteria decision making 平滑强弱比(sPWR):一种新的多准则决策信息系统
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-03-15 Epub Date: 2025-12-29 DOI: 10.1016/j.chemolab.2025.105624
Viviana Consonni, Davide Ballabio, Enmanuel Cruz Muñoz, Veronica Termopoli, Roberto Todeschini
Nowadays, the large number of measurable variables has considerably increased the complexity of data. In the framework of the decision-making process, this leads to the need of adequate tools to set priorities and rank the available options. Ordering is one of the possible ways to analyse multivariate data, which provides an overview of the relationships among the elements of a system. The Multi-Criteria Decision Making (MCDM) encompasses a broad set of methods designed to set priority-based lists of alternatives based on multiple criteria, which support decision problems. Among the most widely adopted techniques, TOPSIS, dominance-based approaches, the Analytic Hierarchy Process (AHP), and Copeland scores represent some of the classical methodologies in both theoretical research and applied decision analysis.
Among the dominance-based approaches, an effective MCDM method is the Power-Weakness Ratio (PWR), which generates a tournament table (i.e., the pairwise comparison matrix) from a data matrix with a varying number of samples (i.e., alternatives to be compared) and variables (i.e., the criteria for pairwise comparisons), weighted according to their relative importance in determining the final ranking. In this study, a variant of the classical Power-Weakness Ratio is presented, significantly modifying the way the tournament table is obtained. The method, called smoothed Power-Weakness Ratio (sPWR), takes into account the dominance degree of the alternatives in each pairwise comparison exploiting the differences between the criterion values. The rationale behind the method is described by the aid of an illustrative example on a simple benchmark dataset with known reference ranking of the samples. The main advantage of the new method over PWR is that its tournament table is much more informative and sensitive to the original data values than the classical pairwise comparison matrix. A multivariate comparison with other classical MCDM methods, performed on several diverse datasets, demonstrated that the results obtained by sPWR were quite similar to those obtained by Copeland Score and TOPSIS with range scaling. However, sPWR showed a higher tendency toward generating full rankings with an enhanced ability to remove ties in the pairwise comparisons.
如今,大量的可测量变量大大增加了数据的复杂性。在决策过程的框架内,这导致需要适当的工具来确定优先事项和对现有选择进行排序。排序是分析多变量数据的一种可能方法,它提供了系统元素之间关系的概述。多标准决策(Multi-Criteria Decision Making, MCDM)包含了一组广泛的方法,这些方法旨在基于多个标准设置基于优先级的备选方案列表,这些列表支持决策问题。在最广泛采用的技术中,TOPSIS、基于优势的方法、层次分析法(AHP)和Copeland分数代表了理论研究和应用决策分析中的一些经典方法。在基于优势的方法中,一种有效的MCDM方法是强弱比(Power-Weakness Ratio, PWR),它从具有不同数量的样本(即待比较的备选方案)和变量(即两两比较的标准)的数据矩阵中生成比例表(即两两比较矩阵),并根据它们在决定最终排名中的相对重要性进行加权。在这项研究中,提出了经典的强弱比的一个变体,显著地改变了争霸赛表的获得方式。该方法被称为平滑强弱比(sPWR),它利用准则值之间的差异,在每次两两比较中考虑备选方案的优势程度。该方法背后的基本原理是通过一个简单的基准数据集的说明性示例来描述的,该数据集具有已知的样本参考排名。与传统的两两比较矩阵相比,新方法的主要优点是其比赛表的信息量更大,对原始数据值更敏感。在多个不同的数据集上与其他经典MCDM方法进行了多变量比较,结果表明sPWR方法与Copeland Score和TOPSIS方法的结果非常相似。然而,sPWR在两两比较中显示出更高的生成完整排名的倾向,并增强了消除联系的能力。
{"title":"Smoothed Power-Weakness Ratio (sPWR): a new informative system for multi-criteria decision making","authors":"Viviana Consonni,&nbsp;Davide Ballabio,&nbsp;Enmanuel Cruz Muñoz,&nbsp;Veronica Termopoli,&nbsp;Roberto Todeschini","doi":"10.1016/j.chemolab.2025.105624","DOIUrl":"10.1016/j.chemolab.2025.105624","url":null,"abstract":"<div><div>Nowadays, the large number of measurable variables has considerably increased the complexity of data. In the framework of the decision-making process, this leads to the need of adequate tools to set priorities and rank the available options. Ordering is one of the possible ways to analyse multivariate data, which provides an overview of the relationships among the elements of a system. The Multi-Criteria Decision Making (MCDM) encompasses a broad set of methods designed to set priority-based lists of alternatives based on multiple criteria, which support decision problems. Among the most widely adopted techniques, TOPSIS, dominance-based approaches, the Analytic Hierarchy Process (AHP), and Copeland scores represent some of the classical methodologies in both theoretical research and applied decision analysis.</div><div>Among the dominance-based approaches, an effective MCDM method is the Power-Weakness Ratio (PWR), which generates a tournament table (i.e., the pairwise comparison matrix) from a data matrix with a varying number of samples (i.e., alternatives to be compared) and variables (i.e., the criteria for pairwise comparisons), weighted according to their relative importance in determining the final ranking. In this study, a variant of the classical Power-Weakness Ratio is presented, significantly modifying the way the tournament table is obtained. The method, called smoothed Power-Weakness Ratio (sPWR), takes into account the dominance degree of the alternatives in each pairwise comparison exploiting the differences between the criterion values. The rationale behind the method is described by the aid of an illustrative example on a simple benchmark dataset with known reference ranking of the samples. The main advantage of the new method over PWR is that its tournament table is much more informative and sensitive to the original data values than the classical pairwise comparison matrix. A multivariate comparison with other classical MCDM methods, performed on several diverse datasets, demonstrated that the results obtained by sPWR were quite similar to those obtained by Copeland Score and TOPSIS with range scaling. However, sPWR showed a higher tendency toward generating full rankings with an enhanced ability to remove ties in the pairwise comparisons.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"270 ","pages":"Article 105624"},"PeriodicalIF":3.8,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146075322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved variable reduction in Partial Least Squares modelling by global-minimum error reproducible Uninformative-Variable Elimination 基于全局最小误差可重复无信息变量消去的偏最小二乘模型中改进的变量约简
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-02-15 Epub Date: 2025-11-29 DOI: 10.1016/j.chemolab.2025.105603
Jan P.M. Andries , Gerjen H. Tinnevelt , Yvan Vander Heyden
The well-known Uninformative-Variable Elimination for Partial Least Squares, denoted as UVE-PLS, is not reproducible regarding the selected variables. Additionally, in UVE, variables are selected in the first minimum of the graph of the root mean squared error of cross validation (RMSECV) against the number of retained variables. This results mostly in rather large numbers of selected variables. Therefore, there is a need for a new and reproducible UVE method with better selective and preferably also better predictive abilities. Consequently, the Global-Minimum Error Reproducible Uninformative-Variable Elimination method, denoted as GME-RUVE, is proposed and tested.
In the GME-RUVE method, main characteristics of two existing methods, i.e. Jack-knife-based Partial Least Squares Regression (JK-PLSR) and Global-Minimum Error Uninformative-Variable Elimination (GME-UVE), are combined. JK-PLSR can be considered as a reproducible version of the original UVE method.
In GME-RUVE, as in the JK-PLSR method, no artificial random variables are added to the X matrix, and firstly the significance of the PLS regression coefficients is determined from jack-knifing. Secondly, as in the GME-UVE method, either the global minimum or the critical RMSECV is used for the selection of the variables. The performance of the new GME-RUVE method is investigated using four datasets with multivariate profiles, i.e. either simulated profiles, NIR spectra or theoretical molecular descriptor profiles, resulting in 12 profile-response (X-y) combinations.
The predictive performance of GME-RUVE, using the global RMSECV minimum and both the selective and predictive performances of GME-RUVE, using the critical RMSECV, are significantly better than both those of the JK-PLSR method, using the first local RMSECV minimum, and of the existing UVE method. The selective and predictive performances of the new GME-RUVE method are also much better than those of the existing GME-UVE method. Moreover, variables selected by the above GME-RUVE method have a chemical meaning.
众所周知的偏最小二乘的非信息变量消除,表示为UVE-PLS,对于所选变量是不可重复的。此外,在UVE中,在交叉验证均方根误差(RMSECV)与保留变量数量的图的第一个最小值中选择变量。这主要导致大量的选定变量。因此,需要一种新的、可重复的、具有更好的选择性和更好的预测能力的UVE方法。在此基础上,提出了全局最小误差可重复无信息变量消除方法GME-RUVE,并对其进行了验证。在GME-RUVE方法中,结合了基于杰克刀的偏最小二乘回归(JK-PLSR)和全局最小误差无信息变量消除(GME-UVE)两种现有方法的主要特点。JK-PLSR可以被认为是原始UVE方法的可重复版本。在GME-RUVE中,与JK-PLSR方法一样,没有在X矩阵中添加人工随机变量,首先通过jack- knife方法确定PLS回归系数的显著性。其次,与GME-UVE方法一样,要么使用全局最小值,要么使用临界RMSECV来选择变量。利用模拟谱、近红外光谱或理论分子描述子谱等4个多变量谱数据集研究了新型GME-RUVE方法的性能,得到了12种谱-响应(X-y)组合。使用全局RMSECV最小值的GME-RUVE方法的预测性能以及使用临界RMSECV的GME-RUVE方法的选择性和预测性能均明显优于使用第一个局部RMSECV最小值的JK-PLSR方法和现有的UVE方法。新的GME-RUVE方法的选择性和预测性能也比现有的GME-UVE方法好得多。而且,上述GME-RUVE方法所选取的变量具有化学意义。
{"title":"Improved variable reduction in Partial Least Squares modelling by global-minimum error reproducible Uninformative-Variable Elimination","authors":"Jan P.M. Andries ,&nbsp;Gerjen H. Tinnevelt ,&nbsp;Yvan Vander Heyden","doi":"10.1016/j.chemolab.2025.105603","DOIUrl":"10.1016/j.chemolab.2025.105603","url":null,"abstract":"<div><div>The well-known Uninformative-Variable Elimination for Partial Least Squares, denoted as UVE-PLS, is not reproducible regarding the selected variables. Additionally, in UVE, variables are selected in the first minimum of the graph of the root mean squared error of cross validation (<em>RMSECV</em>) against the number of retained variables. This results mostly in rather large numbers of selected variables. Therefore, there is a need for a new and reproducible UVE method with better selective and preferably also better predictive abilities. Consequently, the Global-Minimum Error Reproducible Uninformative-Variable Elimination method, denoted as GME-RUVE, is proposed and tested.</div><div>In the GME-RUVE method, main characteristics of two existing methods, i.e. Jack-knife-based Partial Least Squares Regression (JK-PLSR) and Global-Minimum Error Uninformative-Variable Elimination (GME-UVE), are combined. JK-PLSR can be considered as a reproducible version of the original UVE method.</div><div>In GME-RUVE, as in the JK-PLSR method, no artificial random variables are added to the <strong><em>X</em></strong> matrix, and firstly the significance of the PLS regression coefficients is determined from jack-knifing. Secondly, as in the GME-UVE method, either the <em>global minimum</em> or the <em>critical RMSECV</em> is used for the selection of the variables. The performance of the new GME-RUVE method is investigated using four datasets with multivariate profiles, i.e. either simulated profiles, NIR spectra or theoretical molecular descriptor profiles, resulting in 12 profile-response (<strong><em>X</em></strong>-<strong><em>y</em></strong>) combinations.</div><div>The predictive performance of GME-RUVE, using the <em>global RMSECV minimum</em> and both the selective and predictive performances of GME-RUVE, using the <em>critical RMSECV</em>, are significantly better than both those of the JK-PLSR method, using the <em>first local RMSECV minimum</em>, and of the existing UVE method. The selective and predictive performances of the new GME-RUVE method are also much better than those of the existing GME-UVE method. Moreover, variables selected by the above GME-RUVE method have a chemical meaning.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105603"},"PeriodicalIF":3.8,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145682819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chemometric modeling of physicochemical properties using Lanzhou and Ad-Hoc Lanzhou indices: A multi-scale approach for drug design and material informatics 基于兰州指数和Ad-Hoc兰州指数的理化性质的化学计量学建模:药物设计和材料信息学的多尺度方法
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-02-15 Epub Date: 2025-12-03 DOI: 10.1016/j.chemolab.2025.105607
Song Tingting , Sadia Noureen , Saliha Kamran , Sobhy M. Ibrahim , Adnan Aslam
Chemical graph theory serves as a foundational framework in chemical informatics, offering molecular descriptors that enable the prediction of critical physicochemical properties. This study investigates the utility of two recently proposed topological indices — the Lanzhou index and its derivative, the Ad-hoc Lanzhou index — by computing them for four structurally diverse systems: Bismuth(III) Iodide (a layered inorganic compound), Nanostar Dendrimer (a hyperbranched polymer), and the two-dimensional Triangular Oxide and Triangular Silicate Networks. To assess the indices predictive power, we established linear regression models correlating these indices with five experimentally relevant properties of 21 phenethylamine derivatives: molar refractivity (MR), octanol-water partition coefficient (LOG P), calculated Log P (CLog P), critical volume (CV), and boiling point. Statistical robustness was evaluated using the coefficient of determination (R2), F-statistic, and significance level (P-value). The models for boiling point, CV, and MR exhibited strong significance (R2>0,P=0), while LOG P and CLog P also showed statistically valid correlations (P=0), though with slightly lower R2 values. Notably, the Lanzhou index demonstrated marginally superior performance in predicting partition coefficients, suggesting its sensitivity to hydrophobic interactions. These results underscore the efficacy of Lanzhou-based indices as reliable tools for quantifying structure–property relationships, particularly in drug design applications where rapid estimation of solubility, volatility, and bioavailability is critical. Our findings advocate for the broader integration of these indices into cheminformatics pipelines to augment molecular screening and optimization processes
化学图论作为化学信息学的基础框架,提供分子描述符,使关键的物理化学性质的预测成为可能。本研究研究了最近提出的两种拓扑指数的效用——兰州指数及其衍生物,Ad-hoc兰州指数——通过计算四种结构不同的体系:碘化铋(一种层状无机化合物)、纳米树状大分子(一种超支化聚合物)和二维三角形氧化物和三角形硅酸盐网络。为了评估这些指标的预测能力,我们建立了线性回归模型,将这些指标与21种苯乙胺衍生物的五种实验相关性质相关联:摩尔折射率(MR)、辛醇-水分配系数(LOG P)、计算LOG P (CLog P)、临界体积(CV)和沸点。采用决定系数(R2)、f统计量和显著性水平(p值)评估统计稳健性。沸点、CV和MR的模型显示出很强的显著性(R2>0,P=0),而LOG P和CLog P也显示出统计学上有效的相关性(P=0),尽管R2值略低。值得注意的是,兰州指数在预测分配系数方面表现出略微优越的性能,表明其对疏水相互作用的敏感性。这些结果强调了兰州指数作为定量结构-性质关系的可靠工具的有效性,特别是在药物设计应用中,快速估计溶解度、挥发性和生物利用度至关重要。我们的研究结果提倡将这些指标更广泛地整合到化学信息学管道中,以增强分子筛选和优化过程
{"title":"Chemometric modeling of physicochemical properties using Lanzhou and Ad-Hoc Lanzhou indices: A multi-scale approach for drug design and material informatics","authors":"Song Tingting ,&nbsp;Sadia Noureen ,&nbsp;Saliha Kamran ,&nbsp;Sobhy M. Ibrahim ,&nbsp;Adnan Aslam","doi":"10.1016/j.chemolab.2025.105607","DOIUrl":"10.1016/j.chemolab.2025.105607","url":null,"abstract":"<div><div>Chemical graph theory serves as a foundational framework in chemical informatics, offering molecular descriptors that enable the prediction of critical physicochemical properties. This study investigates the utility of two recently proposed topological indices — the Lanzhou index and its derivative, the Ad-hoc Lanzhou index — by computing them for four structurally diverse systems: Bismuth(III) Iodide (a layered inorganic compound), Nanostar Dendrimer (a hyperbranched polymer), and the two-dimensional Triangular Oxide and Triangular Silicate Networks. To assess the indices predictive power, we established linear regression models correlating these indices with five experimentally relevant properties of 21 phenethylamine derivatives: molar refractivity (MR), octanol-water partition coefficient (LOG P), calculated Log P (CLog P), critical volume (CV), and boiling point. Statistical robustness was evaluated using the coefficient of determination (<span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>), F-statistic, and significance level (<span><math><mi>P</mi></math></span>-value). The models for boiling point, CV, and MR exhibited strong significance (<span><math><mrow><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>&gt;</mo><mn>0</mn><mo>,</mo><mi>P</mi><mo>=</mo><mn>0</mn></mrow></math></span>), while LOG P and CLog P also showed statistically valid correlations (<span><math><mrow><mi>P</mi><mo>=</mo><mn>0</mn></mrow></math></span>), though with slightly lower <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> values. Notably, the Lanzhou index demonstrated marginally superior performance in predicting partition coefficients, suggesting its sensitivity to hydrophobic interactions. These results underscore the efficacy of Lanzhou-based indices as reliable tools for quantifying structure–property relationships, particularly in drug design applications where rapid estimation of solubility, volatility, and bioavailability is critical. Our findings advocate for the broader integration of these indices into cheminformatics pipelines to augment molecular screening and optimization processes</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105607"},"PeriodicalIF":3.8,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145682774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bridging lab-to-clinic: microbiological screening via Swin-Ultra Transformer with transfer learning 连接实验室到诊所:通过swing - ultra Transformer和迁移学习进行微生物筛选
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-02-15 Epub Date: 2025-12-02 DOI: 10.1016/j.chemolab.2025.105605
Yunxin Wang , Wenjing Zhang , Hongguo Wei , Yuetian Ren , Haosong Du , Wenbin Xu , Ailing Tan , Shuo Chen
Bacterial infections are a critical global health issue, requiring rapid and precise pathogen identification for effective infection control. Traditional methods, such as culture and nucleic acid amplification, are often slow and lack sensitivity. Raman spectroscopy combing with deep learning has been a powerful technique for microbial identification. However, limitations such as bacterial physiological states, genetic variation, interference from biological materials, and differences in laboratory conditions make its practical application still challenging. This study introduces a feature-enhanced dual-attention pathway Shifted Window-Ultra (Swin-Ultra) Transformer architecture, integrated with deep transfer learning, to address challenges like bacterial physiological states, genetic variation, and laboratory condition discrepancies. A Bacterial Pre-trained Transformer (BPT) was developed using the Bacteria-ID database, achieving excellent classification performance, i.e., 98.26 % accuracy. Fine-tuning with clinical datasets yielded accuracies of 99.80 % for bacterial pathogens and 98.53 % for Cryptococcus genotypes. This approach, bridges laboratory models and clinical applications, enhancing unknown pathogen identification, infection control, and public health surveillance, with significant potential to improve patient outcomes.
细菌感染是一个重要的全球卫生问题,需要快速和精确的病原体识别才能有效控制感染。传统的方法,如培养和核酸扩增,往往是缓慢和缺乏灵敏度。拉曼光谱与深度学习相结合已成为微生物鉴定的有力技术。然而,细菌的生理状态、遗传变异、生物材料的干扰以及实验室条件的差异等限制使其实际应用仍然具有挑战性。本研究引入了一种功能增强的双注意路径转移窗口-超(swan - ultra)转换器架构,结合深度迁移学习,以解决细菌生理状态、遗传变异和实验室条件差异等挑战。利用细菌id数据库开发了细菌预训练转换器(BPT),实现了优异的分类性能,准确率达到98.26%。对临床数据集进行微调,对细菌病原体和隐球菌基因型的准确率分别为99.80%和98.53%。这种方法连接了实验室模型和临床应用,加强了未知病原体的识别、感染控制和公共卫生监测,具有改善患者预后的巨大潜力。
{"title":"Bridging lab-to-clinic: microbiological screening via Swin-Ultra Transformer with transfer learning","authors":"Yunxin Wang ,&nbsp;Wenjing Zhang ,&nbsp;Hongguo Wei ,&nbsp;Yuetian Ren ,&nbsp;Haosong Du ,&nbsp;Wenbin Xu ,&nbsp;Ailing Tan ,&nbsp;Shuo Chen","doi":"10.1016/j.chemolab.2025.105605","DOIUrl":"10.1016/j.chemolab.2025.105605","url":null,"abstract":"<div><div>Bacterial infections are a critical global health issue, requiring rapid and precise pathogen identification for effective infection control. Traditional methods, such as culture and nucleic acid amplification, are often slow and lack sensitivity. Raman spectroscopy combing with deep learning has been a powerful technique for microbial identification. However, limitations such as bacterial physiological states, genetic variation, interference from biological materials, and differences in laboratory conditions make its practical application still challenging. This study introduces a feature-enhanced dual-attention pathway Shifted Window-Ultra (Swin-Ultra) Transformer architecture, integrated with deep transfer learning, to address challenges like bacterial physiological states, genetic variation, and laboratory condition discrepancies. A Bacterial Pre-trained Transformer (BPT) was developed using the Bacteria-ID database, achieving excellent classification performance, i.e., 98.26 % accuracy. Fine-tuning with clinical datasets yielded accuracies of 99.80 % for bacterial pathogens and 98.53 % for Cryptococcus genotypes. This approach, bridges laboratory models and clinical applications, enhancing unknown pathogen identification, infection control, and public health surveillance, with significant potential to improve patient outcomes.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105605"},"PeriodicalIF":3.8,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145682908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explainable machine learning enables robust evaluation of extracted ion chromatograms in LC–MS metabolomics 可解释的机器学习能够在LC-MS代谢组学中对提取的离子色谱进行稳健的评估
IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-02-15 Epub Date: 2025-11-26 DOI: 10.1016/j.chemolab.2025.105591
Juehong Dai , Liheng Dong , Jingjing Xu , Lingli Deng , Lei Guo , Jiyang Dong
Reliable evaluation of extracted ion chromatograms (EICs) remains a persistent challenge in LC–MS metabolomics, as inaccuracies in peak identification can profoundly impact subsequent data analysis and interpretation. While recent deep learning approaches show promise, their computational burden, limited generalizability, and lack of interpretability hinder broad adoption in routine analytical workflows. To address these limitations, we introduce EXACT-EIC (EXplainable Assessment of Chromatogram qualiTy for EICs), a lightweight, explainable machine learning framework. EXACT-EIC employs a thoughtfully designed 34 handcrafted features to perform two critical tasks: effective binary classification of EICs (peak vs. noise) and quantitative quality scoring. Benchmarking on curated in-house and public testing set demonstrated that EXACT-EIC achieved 95.2 % accuracy and 98.1 % recall for classification. For quantitative assessment, it attained a mean absolute error of 0.70 on a 1–10 expert-assigned quality scale. These results consistently outperformed state-of-the-art deep learning methods including PeakOnly and QuanFormer. Furthermore, Shapley Additive exPlanations (SHAP) analysis quantified the contribution of key chromatographic features (e.g., apex-boundary ratio, distribution entropy) to model predictions, offering transparent mechanistic insights absent in "black-box" architectures. By combining robustness, interpretability, and computational efficiency, EXACT-EIC facilitates reliable EIC evaluation across diverse platforms and experimental conditions. It provides a practical, deployable solution for automated quality control and confident metabolite annotation, addressing a critical need in untargeted LC–MS metabolomics workflows.
在LC-MS代谢组学中,可靠地评估提取的离子色谱(EICs)仍然是一个持续的挑战,因为峰识别的不准确性会严重影响随后的数据分析和解释。虽然最近的深度学习方法显示出希望,但它们的计算负担、有限的通用性和缺乏可解释性阻碍了在常规分析工作流程中的广泛采用。为了解决这些限制,我们引入了EXACT-EIC (EICs的可解释色谱质量评估),这是一个轻量级的,可解释的机器学习框架。EXACT-EIC采用精心设计的34个手工功能来执行两个关键任务:有效的eic二进制分类(峰值与噪声)和定量质量评分。对策划的内部和公共测试集的基准测试表明,EXACT-EIC在分类方面达到了95.2%的准确率和98.1%的召回率。对于定量评估,它在1-10专家分配的质量量表上达到了0.70的平均绝对误差。这些结果始终优于最先进的深度学习方法,包括PeakOnly和QuanFormer。此外,Shapley加性解释(SHAP)分析量化了关键色谱特征(如顶点边界比、分布熵)对模型预测的贡献,提供了“黑箱”架构中缺乏的透明机制见解。通过结合鲁棒性、可解释性和计算效率,EXACT-EIC有助于在不同平台和实验条件下进行可靠的EIC评估。它提供了一个实用的、可部署的解决方案,用于自动化质量控制和自信的代谢物注释,解决了非靶向LC-MS代谢组学工作流程中的关键需求。
{"title":"Explainable machine learning enables robust evaluation of extracted ion chromatograms in LC–MS metabolomics","authors":"Juehong Dai ,&nbsp;Liheng Dong ,&nbsp;Jingjing Xu ,&nbsp;Lingli Deng ,&nbsp;Lei Guo ,&nbsp;Jiyang Dong","doi":"10.1016/j.chemolab.2025.105591","DOIUrl":"10.1016/j.chemolab.2025.105591","url":null,"abstract":"<div><div>Reliable evaluation of extracted ion chromatograms (EICs) remains a persistent challenge in LC–MS metabolomics, as inaccuracies in peak identification can profoundly impact subsequent data analysis and interpretation. While recent deep learning approaches show promise, their computational burden, limited generalizability, and lack of interpretability hinder broad adoption in routine analytical workflows. To address these limitations, we introduce EXACT-EIC (EXplainable Assessment of Chromatogram qualiTy for EICs), a lightweight, explainable machine learning framework. EXACT-EIC employs a thoughtfully designed 34 handcrafted features to perform two critical tasks: effective binary classification of EICs (peak vs. noise) and quantitative quality scoring. Benchmarking on curated in-house and public testing set demonstrated that EXACT-EIC achieved 95.2 % accuracy and 98.1 % recall for classification. For quantitative assessment, it attained a mean absolute error of 0.70 on a 1–10 expert-assigned quality scale. These results consistently outperformed state-of-the-art deep learning methods including PeakOnly and QuanFormer. Furthermore, Shapley Additive exPlanations (SHAP) analysis quantified the contribution of key chromatographic features (e.g., apex-boundary ratio, distribution entropy) to model predictions, offering transparent mechanistic insights absent in \"black-box\" architectures. By combining robustness, interpretability, and computational efficiency, EXACT-EIC facilitates reliable EIC evaluation across diverse platforms and experimental conditions. It provides a practical, deployable solution for automated quality control and confident metabolite annotation, addressing a critical need in untargeted LC–MS metabolomics workflows.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"269 ","pages":"Article 105591"},"PeriodicalIF":3.8,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145610607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Chemometrics and Intelligent Laboratory Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1