首页 > 最新文献

Journal of Chemometrics最新文献

英文 中文
The Classification Limit of Detection: Estimating Sample-Level Classification Uncertainty in Spectroscopy Using Monte Carlo Error Propagation of Spectral Noise 检测的分类极限:利用光谱噪声的蒙特卡罗误差传播估计光谱中样本级分类不确定度
IF 2.1 4区 化学 Q1 SOCIAL WORK Pub Date : 2025-07-12 DOI: 10.1002/cem.70048
Helder V. Carneiro, Caelin P. Celani, Karl S. Booksh

This study presents a novel Monte Carlo–based methodology for estimating classification uncertainty in chemometric models by propagating spectral measurement noise. Unlike traditional approaches that treat classification as deterministic, this method simulates realistic noise structures, both independent and correlated, captured from multiple spectrum measurements to quantify sample-specific uncertainty. The technique is applicable to both linear and non-linear models, including partial least squares discriminant analysis (PLS-DA) and various support vector machine (SVM) kernels. The methodology was validated using three datasets: synthetic 2D simulations for controlled model geometry, X-ray fluorescence (XRF) spectra from colored glass rods, and laser-induced breakdown spectroscopy (LIBS) data from Dalbergia wood species. Results revealed that uncertainty increases with spectral similarity and perpendicular alignment between noise structures and decision boundaries. In real-world applications, classification metrics alone proved insufficient to assess model reliability. The inclusion of uncertainty intervals enabled identification of ambiguous predictions even in cases of perfect classification accuracy. This work advances chemometric analysis by linking measurement uncertainty to classification outcomes, offering a robust framework for decision-making in high-stakes analytical contexts.

本文提出了一种新的基于蒙特卡罗的方法,通过传播光谱测量噪声来估计化学计量模型中的分类不确定性。与将分类视为确定性的传统方法不同,该方法模拟了从多个频谱测量中捕获的独立和相关的现实噪声结构,以量化样品特定的不确定性。该技术适用于线性和非线性模型,包括偏最小二乘判别分析(PLS-DA)和各种支持向量机(SVM)核。该方法使用三个数据集进行验证:控制模型几何形状的合成二维模拟,彩色玻璃棒的x射线荧光(XRF)光谱,以及黄檀木材物种的激光诱导击穿光谱(LIBS)数据。结果表明,不确定性随着谱相似性和噪声结构与决策边界的垂直对齐而增加。在实际应用中,分类度量本身不足以评估模型的可靠性。不确定区间的包含使模糊预测的识别即使在完美的分类精度的情况下。这项工作通过将测量不确定性与分类结果联系起来,推进了化学计量学分析,为高风险分析环境中的决策提供了一个强大的框架。
{"title":"The Classification Limit of Detection: Estimating Sample-Level Classification Uncertainty in Spectroscopy Using Monte Carlo Error Propagation of Spectral Noise","authors":"Helder V. Carneiro,&nbsp;Caelin P. Celani,&nbsp;Karl S. Booksh","doi":"10.1002/cem.70048","DOIUrl":"10.1002/cem.70048","url":null,"abstract":"<div>\u0000 \u0000 <p>This study presents a novel Monte Carlo–based methodology for estimating classification uncertainty in chemometric models by propagating spectral measurement noise. Unlike traditional approaches that treat classification as deterministic, this method simulates realistic noise structures, both independent and correlated, captured from multiple spectrum measurements to quantify sample-specific uncertainty. The technique is applicable to both linear and non-linear models, including partial least squares discriminant analysis (PLS-DA) and various support vector machine (SVM) kernels. The methodology was validated using three datasets: synthetic 2D simulations for controlled model geometry, X-ray fluorescence (XRF) spectra from colored glass rods, and laser-induced breakdown spectroscopy (LIBS) data from <i>Dalbergia</i> wood species. Results revealed that uncertainty increases with spectral similarity and perpendicular alignment between noise structures and decision boundaries. In real-world applications, classification metrics alone proved insufficient to assess model reliability. The inclusion of uncertainty intervals enabled identification of ambiguous predictions even in cases of perfect classification accuracy. This work advances chemometric analysis by linking measurement uncertainty to classification outcomes, offering a robust framework for decision-making in high-stakes analytical contexts.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 7","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144606699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Dynamic Iterative Data Cleaning Strategy Based on Model Feedback to Enhance the Prediction Accuracy of Nanocellulose Emulsions 基于模型反馈的动态迭代数据清洗策略提高纳米纤维素乳剂的预测精度
IF 2.1 4区 化学 Q1 SOCIAL WORK Pub Date : 2025-07-12 DOI: 10.1002/cem.70046
Long Wang, Zi'ang Xia, Yao Zhang, Xiaoyu Liu, Chaojie Li, Xue Li, Jiahao Dai, Mingshun Bi, Jingxue Yang, Heng Zhang

The effectiveness of artificial neural networks, which were key technologies in artificial intelligence, greatly depends on the quality of the input data. Data cleaning, a crucial component of data preprocessing, played a vital role in enhancing the accuracy, robustness, and generalization capabilities of neural network models. In this study, a Feedback-Driven Iterative Cleaning (FDIC) framework, guided by model performance, was developed and applied to the study of droplet size prediction models for nanocellulose-stabilized Pickering emulsion systems. After randomly removing between 1% and 40% of the data, an artificial neural network model was established using CNC particle size (X1), CNC concentration (X2), and the oil–water volume ratio of CNC to oil-phase monomer (X3) as input variables, with emulsion droplet size (Y) as the quantitative index. The model's accuracy was evaluated after data removal using the coefficient of determination (R2), mean squared error (MSE), and mean absolute scaling error (MASE). The main finding was that targeted removal of a small portion of the data significantly improved the predictive power of the model. Specifically, removing 5% of the dataset results in optimal performance, with R2 improving from 0.5307 without cleaning to 0.7258, with an MSE of 183.4917, and MASE of 0.4060. This result suggested a significant and quantifiable improvement in the accuracy of the model through our iterative cleaning process. The study revealed a nonlinear relationship between the number of iterations and the model's generalization ability. This finding offered a novel methodological tool for data governance in the smart era and demonstrates significant value in dynamic environments.

人工神经网络是人工智能的关键技术,其有效性在很大程度上取决于输入数据的质量。数据清洗是数据预处理的重要组成部分,对提高神经网络模型的准确性、鲁棒性和泛化能力起着至关重要的作用。在本研究中,以模型性能为指导,开发了一个反馈驱动迭代清洗(FDIC)框架,并将其应用于纳米纤维素稳定皮克林乳液体系的液滴尺寸预测模型的研究。随机剔除1% ~ 40%的数据后,以CNC粒度(X1)、CNC浓度(X2)、CNC与油相单体油水体积比(X3)为输入变量,以乳化液液滴粒径(Y)为定量指标,建立人工神经网络模型。剔除数据后,使用决定系数(R2)、均方误差(MSE)和平均绝对缩放误差(MASE)评估模型的准确性。主要发现是,有针对性地删除一小部分数据显著提高了模型的预测能力。具体来说,删除5%的数据集可以获得最佳性能,R2从未清理的0.5307提高到0.7258,MSE为183.4917,MASE为0.4060。这一结果表明,通过我们的迭代清洗过程,模型的准确性有了显著的、可量化的提高。研究表明,迭代次数与模型泛化能力之间存在非线性关系。这一发现为智能时代的数据治理提供了一种新的方法论工具,并在动态环境中展示了重要的价值。
{"title":"A Dynamic Iterative Data Cleaning Strategy Based on Model Feedback to Enhance the Prediction Accuracy of Nanocellulose Emulsions","authors":"Long Wang,&nbsp;Zi'ang Xia,&nbsp;Yao Zhang,&nbsp;Xiaoyu Liu,&nbsp;Chaojie Li,&nbsp;Xue Li,&nbsp;Jiahao Dai,&nbsp;Mingshun Bi,&nbsp;Jingxue Yang,&nbsp;Heng Zhang","doi":"10.1002/cem.70046","DOIUrl":"10.1002/cem.70046","url":null,"abstract":"<div>\u0000 \u0000 <p>The effectiveness of artificial neural networks, which were key technologies in artificial intelligence, greatly depends on the quality of the input data. Data cleaning, a crucial component of data preprocessing, played a vital role in enhancing the accuracy, robustness, and generalization capabilities of neural network models. In this study, a Feedback-Driven Iterative Cleaning (FDIC) framework, guided by model performance, was developed and applied to the study of droplet size prediction models for nanocellulose-stabilized Pickering emulsion systems. After randomly removing between 1% and 40% of the data, an artificial neural network model was established using CNC particle size (X1), CNC concentration (X2), and the oil–water volume ratio of CNC to oil-phase monomer (X3) as input variables, with emulsion droplet size (Y) as the quantitative index. The model's accuracy was evaluated after data removal using the coefficient of determination (R<sup>2</sup>), mean squared error (MSE), and mean absolute scaling error (MASE). The main finding was that targeted removal of a small portion of the data significantly improved the predictive power of the model. Specifically, removing 5% of the dataset results in optimal performance, with <i>R</i><sup><i>2</i></sup> improving from 0.5307 without cleaning to 0.7258, with an MSE of 183.4917, and MASE of 0.4060. This result suggested a significant and quantifiable improvement in the accuracy of the model through our iterative cleaning process. The study revealed a nonlinear relationship between the number of iterations and the model's generalization ability. This finding offered a novel methodological tool for data governance in the smart era and demonstrates significant value in dynamic environments.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 7","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144606753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nondestructive Identification of Paper Based on Relative Formation Time Using Three-Dimensional Fluorescence Spectroscopy Combined With Supervised Learning 基于相对形成时间的三维荧光光谱与监督学习相结合的纸张无损识别
IF 2.1 4区 化学 Q1 SOCIAL WORK Pub Date : 2025-07-11 DOI: 10.1002/cem.70043
Xiaohong Chen, Yuhuan He, Lan Cui, Hongda Li, Xiaojing Wu

In order to achieve nondestructive analysis and identification of the relative formation time of paper evidence and to solve the difficulties in document authenticity identification in the field of forensic science, this study selected three-dimensional fluorescence spectroscopy data of paper evidence of the same brand and model collected in the same storage environment within the last decade (2012–2023). After preprocessing steps like eliminating scattering, smoothing noise and principal component analysis (PCA), machine learning algorithms such as K-nearest neighbor (KNN) and linear discriminant analysis (LDA) were employed to classify and predict specific feature bands. The accuracy of KNN and LDA was 94.5% and 98.9%, respectively. Furthermore, relative formation time prediction was conducted for paper samples by LDA in the sample library, achieving an accuracy rate of 98.0%. Finally, the established model was successfully applied to analyze an actual case involving suspected “forged official documents.” It accurately determined the relative formation time of the forged paper, and the analysis results were consistent with the suspect's confession.

为了实现对纸质证据相对形成时间的无损分析与鉴定,解决法医学领域文书真实性鉴定的难题,本研究选取了近十年(2012-2023年)在同一存储环境下采集的同品牌、同型号纸质证据的三维荧光光谱数据。在消除散射、平滑噪声和主成分分析(PCA)等预处理步骤之后,采用k近邻(KNN)和线性判别分析(LDA)等机器学习算法对特定特征波段进行分类和预测。KNN和LDA的准确率分别为94.5%和98.9%。利用LDA对样本库中的纸质样本进行相对形成时间预测,准确率达到98.0%。最后,将所建立的模型成功地应用于一起涉嫌“伪造公文”的实际案例分析。准确确定了伪造纸的相对形成时间,分析结果与犯罪嫌疑人的供词一致。
{"title":"Nondestructive Identification of Paper Based on Relative Formation Time Using Three-Dimensional Fluorescence Spectroscopy Combined With Supervised Learning","authors":"Xiaohong Chen,&nbsp;Yuhuan He,&nbsp;Lan Cui,&nbsp;Hongda Li,&nbsp;Xiaojing Wu","doi":"10.1002/cem.70043","DOIUrl":"10.1002/cem.70043","url":null,"abstract":"<div>\u0000 \u0000 <p>In order to achieve nondestructive analysis and identification of the relative formation time of paper evidence and to solve the difficulties in document authenticity identification in the field of forensic science, this study selected three-dimensional fluorescence spectroscopy data of paper evidence of the same brand and model collected in the same storage environment within the last decade (2012–2023). After preprocessing steps like eliminating scattering, smoothing noise and principal component analysis (PCA), machine learning algorithms such as <i>K</i>-nearest neighbor (KNN) and linear discriminant analysis (LDA) were employed to classify and predict specific feature bands. The accuracy of KNN and LDA was 94.5% and 98.9%, respectively. Furthermore, relative formation time prediction was conducted for paper samples by LDA in the sample library, achieving an accuracy rate of 98.0%. Finally, the established model was successfully applied to analyze an actual case involving suspected “forged official documents.” It accurately determined the relative formation time of the forged paper, and the analysis results were consistent with the suspect's confession.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 7","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144598426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
XAI-2DCOS: Enhancing Interpretability in Spectral Deep Learning Models Through 2D Correlation Spectroscopy XAI-2DCOS:通过二维相关光谱增强光谱深度学习模型的可解释性
IF 2.1 4区 化学 Q1 SOCIAL WORK Pub Date : 2025-07-11 DOI: 10.1002/cem.70045
Jhonatan Contreras, Thomas Bocklitz

Deep learning (DL) has significantly advanced Raman spectra analysis, achieving high accuracy and efficiency. However, their complexity and opacity limit their application in areas where understanding and transparency are essential. To address this, we present XAI-2DCOS, an innovative eXplainable Artificial Intelligence (XAI) framework that employs 2D correlation spectroscopy (2DCOS). Traditionally, 2DCOS reveals the sequence of molecular changes under varying conditions. We repurpose it to enhance the interpretability of DL models by linking changes in spectral features to model outputs, identifying critical wavenumbers, and how their variations affect model accuracy. We applied XAI-2DCOS to a DL model trained on a dataset of oil Raman spectra, demonstrating its ability to identify critical spectral features that align with domain knowledge. To improve robustness, we integrated a conditional generative adversarial network (CGAN) for data augmentation. CGAN generates synthetic data, ensuring the presence of spectra across the entire probability range. A normalized relevance score quantifies the contribution for each wavenumber to the model's prediction. A predictive probability map delineates decision boundaries within the PCA space. Synchronous 2DCOS maps are used to guide spectral adjustments that improve prediction confidence for specific class predictions. These adjustments can affect multiple output classes with differential scaling of output activations, suggesting that crossing a threshold can shift the model decision. Our results show that XAI-2DCOS improves the interpretability and reliability of DL models applied to Raman spectra. Furthermore, CGAN data augmentation extends the applicability of XAI-2DCOS to smaller datasets.

深度学习(DL)在拉曼光谱分析方面具有显著的进步,实现了高精度和高效率。然而,它们的复杂性和不透明性限制了它们在理解和透明至关重要的领域的应用。为了解决这个问题,我们提出了XAI-2DCOS,这是一种创新的可解释人工智能(XAI)框架,采用2D相关光谱(2DCOS)。传统上,2DCOS揭示了不同条件下分子变化的序列。我们将其重新用于增强DL模型的可解释性,方法是将光谱特征的变化与模型输出联系起来,识别关键波数,以及它们的变化如何影响模型精度。我们将XAI-2DCOS应用于在石油拉曼光谱数据集上训练的深度学习模型,证明了其识别与领域知识一致的关键光谱特征的能力。为了提高鲁棒性,我们集成了一个条件生成对抗网络(CGAN)来进行数据增强。CGAN生成合成数据,确保在整个概率范围内存在光谱。规范化的相关性评分量化了每个波数对模型预测的贡献。预测概率图描绘了PCA空间内的决策边界。同步2DCOS地图用于指导光谱调整,以提高特定类别预测的预测信心。这些调整可以影响具有不同输出激活比例的多个输出类,这表明跨越阈值可以改变模型决策。结果表明,XAI-2DCOS提高了拉曼光谱DL模型的可解释性和可靠性。此外,CGAN数据增强将XAI-2DCOS的适用性扩展到更小的数据集。
{"title":"XAI-2DCOS: Enhancing Interpretability in Spectral Deep Learning Models Through 2D Correlation Spectroscopy","authors":"Jhonatan Contreras,&nbsp;Thomas Bocklitz","doi":"10.1002/cem.70045","DOIUrl":"10.1002/cem.70045","url":null,"abstract":"<p>Deep learning (DL) has significantly advanced Raman spectra analysis, achieving high accuracy and efficiency. However, their complexity and opacity limit their application in areas where understanding and transparency are essential. To address this, we present XAI-2DCOS, an innovative eXplainable Artificial Intelligence (XAI) framework that employs 2D correlation spectroscopy (2DCOS). Traditionally, 2DCOS reveals the sequence of molecular changes under varying conditions. We repurpose it to enhance the interpretability of DL models by linking changes in spectral features to model outputs, identifying critical wavenumbers, and how their variations affect model accuracy. We applied XAI-2DCOS to a DL model trained on a dataset of oil Raman spectra, demonstrating its ability to identify critical spectral features that align with domain knowledge. To improve robustness, we integrated a conditional generative adversarial network (CGAN) for data augmentation. CGAN generates synthetic data, ensuring the presence of spectra across the entire probability range. A normalized relevance score quantifies the contribution for each wavenumber to the model's prediction. A predictive probability map delineates decision boundaries within the PCA space. Synchronous 2DCOS maps are used to guide spectral adjustments that improve prediction confidence for specific class predictions. These adjustments can affect multiple output classes with differential scaling of output activations, suggesting that crossing a threshold can shift the model decision. Our results show that XAI-2DCOS improves the interpretability and reliability of DL models applied to Raman spectra. Furthermore, CGAN data augmentation extends the applicability of XAI-2DCOS to smaller datasets.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 7","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.70045","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144598427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial: Honoring Prof. Age K. Smilde 社论:纪念Age K. Smilde教授
IF 2.1 4区 化学 Q1 SOCIAL WORK Pub Date : 2025-07-10 DOI: 10.1002/cem.70052
Rasmus Bro
<p>It is both a privilege and an emotional moment for me to write this editorial for the special issue of the <i>Journal of Chemometrics</i> honoring Prof. Age K. Smilde, who recently retired. For me, and for countless others in our field, Prof. Smilde (also more informally know as Age) has been more than a scholar; he has been a mentor, a collaborator, and an inspiration whose contributions have left a huge mark on the world of chemometrics.</p><p>Looking back, it feels almost surreal to think of my early days in academia, 30 years ago, when I was navigating the complex world of multi-way tensor analysis. At the time, Age seemed to me to be the quintessential ‘all-knowing’ professor. His mastery of the field, combined with a willingness to mentor and nurture young scientists, made a profound difference in my career. I remember a conference where he explained the complexity of tensor rank. I quickly grasped the problem and slightly arrogantly said: I will fix it. I tried. I was very fast and 100% wrong. I never managed to make even the slightest progress!</p><p>He played a pivotal role in helping me craft some of my earliest papers, including one of the first approaches to tensor regression. Our discussions on the properties of multi-way arrays and their applications remain etched in my memory—not just as lessons in science, but as moments of shared curiosity.</p><p>Age's career is nothing short of extraordinary. From his foundational work at the University of Groningen to his tenure at the University of Amsterdam, where he led the group later known as Biosystems Data Analysis, Age has consistently been at the forefront of methodological advancements in not just chemometrics. His work on multi-way analysis, data integration, and systems biology has truly shaped the respective fields. It is no surprise that he has been honored with numerous awards, such as the prestigious Herman Wold Gold Medal and the Kowalski Award, reflecting his pioneering contributions and global recognition.</p><p>What sets Age apart, is his ability to foster collaboration and build bridges within the scientific community. He introduced me to some of the most significant researchers not only in chemometrics but also in psychometrics, widening my horizons and opening doors that would otherwise have remained closed. His efforts to create platforms for collaboration, such as co-founding TRICAP and contributing to international chemometric meetings, have enriched our discipline.</p><p>Reflecting on the arc of our careers, I cannot help but smile at the realization that the ‘old’ professor who once seemed so far ahead of me is, in fact, only a few years my senior. Time has a way of leveling us, and today I count Age as not only a colleague but also a dear friend and peer. His wisdom, humility, and warmth continue to inspire, and his legacy will undoubtedly endure through the countless students, collaborators, and researchers he has influenced.</p><p>This special issue is a testam
为《化学计量学杂志》特刊撰写这篇社论,以纪念最近退休的Age K. Smilde教授,对我来说,这既是一种荣幸,也是一种激动的时刻。对我和我们这个领域的无数其他人来说,斯米尔德教授(也被非正式地称为Age)不仅仅是一位学者;他是我的导师、合作者和灵感来源,他的贡献在化学计量学领域留下了巨大的印记。回首往事,回想起30年前我在学术界的早期时光,感觉几乎是超现实的,当时我正在探索多路张量分析的复杂世界。当时,在我看来,Age是一位典型的“无所不知”教授。他对这个领域的精通,加上他愿意指导和培养年轻科学家,对我的职业生涯产生了深远的影响。我记得在一次会议上,他解释了张量秩的复杂性。我很快就明白了问题所在,略带傲慢地说:我会解决的。我试过了。我猜得很快,而且完全错了。我从来没有取得哪怕是一点点的进步!他在帮助我撰写我最早的一些论文中发挥了关键作用,包括最早的张量回归方法之一。我们关于多路阵列的特性及其应用的讨论仍然铭刻在我的记忆中——不仅作为科学课程,而且作为共同好奇的时刻。Age的事业是非凡的。从他在格罗宁根大学的基础工作到他在阿姆斯特丹大学的任期,在那里他领导了后来被称为生物系统数据分析的小组,Age一直站在方法论进步的最前沿,而不仅仅是化学计量学。他在多路分析、数据集成和系统生物学方面的工作真正塑造了各自的领域。毫无疑问,他获得了许多奖项,如久负盛名的赫尔曼世界金奖和科瓦尔斯基奖,这反映了他的开创性贡献和全球认可。让Age与众不同的是他在科学界促进合作和建立桥梁的能力。他向我介绍了一些最重要的研究人员,不仅在化学计量学方面,而且在心理计量学方面,拓宽了我的视野,打开了原本紧闭的大门。他努力创建合作平台,如共同创立TRICAP和参与国际化学计量学会议,丰富了我们的学科。回顾我们的职业生涯,我不禁笑了,因为我意识到,这位曾经看起来遥遥领先于我的“老”教授,实际上只比我年长几岁。时间会让我们变得更平,今天,我不仅把年龄视为同事,还视其为亲爱的朋友和同伴。他的智慧、谦逊和热情继续激励着我们,他的遗产无疑将通过他影响的无数学生、合作者和研究人员而延续下去。本期特刊证明了斯米尔德教授对我们这个领域的影响。它汇集了研究人员的贡献,这些研究人员的工作受到他的思想、指导和合作的影响。这是对像Age这样的科学家最恰当的致敬。我谨代表所有有幸与斯米尔德教授共事的人,感谢你,Age,感谢你孜孜不倦的贡献、你的指导和你的友谊。我们不仅庆祝你非凡的职业生涯,也庆祝背后的人——一个真正的化学计量学巨人。
{"title":"Editorial: Honoring Prof. Age K. Smilde","authors":"Rasmus Bro","doi":"10.1002/cem.70052","DOIUrl":"10.1002/cem.70052","url":null,"abstract":"&lt;p&gt;It is both a privilege and an emotional moment for me to write this editorial for the special issue of the &lt;i&gt;Journal of Chemometrics&lt;/i&gt; honoring Prof. Age K. Smilde, who recently retired. For me, and for countless others in our field, Prof. Smilde (also more informally know as Age) has been more than a scholar; he has been a mentor, a collaborator, and an inspiration whose contributions have left a huge mark on the world of chemometrics.&lt;/p&gt;&lt;p&gt;Looking back, it feels almost surreal to think of my early days in academia, 30 years ago, when I was navigating the complex world of multi-way tensor analysis. At the time, Age seemed to me to be the quintessential ‘all-knowing’ professor. His mastery of the field, combined with a willingness to mentor and nurture young scientists, made a profound difference in my career. I remember a conference where he explained the complexity of tensor rank. I quickly grasped the problem and slightly arrogantly said: I will fix it. I tried. I was very fast and 100% wrong. I never managed to make even the slightest progress!&lt;/p&gt;&lt;p&gt;He played a pivotal role in helping me craft some of my earliest papers, including one of the first approaches to tensor regression. Our discussions on the properties of multi-way arrays and their applications remain etched in my memory—not just as lessons in science, but as moments of shared curiosity.&lt;/p&gt;&lt;p&gt;Age's career is nothing short of extraordinary. From his foundational work at the University of Groningen to his tenure at the University of Amsterdam, where he led the group later known as Biosystems Data Analysis, Age has consistently been at the forefront of methodological advancements in not just chemometrics. His work on multi-way analysis, data integration, and systems biology has truly shaped the respective fields. It is no surprise that he has been honored with numerous awards, such as the prestigious Herman Wold Gold Medal and the Kowalski Award, reflecting his pioneering contributions and global recognition.&lt;/p&gt;&lt;p&gt;What sets Age apart, is his ability to foster collaboration and build bridges within the scientific community. He introduced me to some of the most significant researchers not only in chemometrics but also in psychometrics, widening my horizons and opening doors that would otherwise have remained closed. His efforts to create platforms for collaboration, such as co-founding TRICAP and contributing to international chemometric meetings, have enriched our discipline.&lt;/p&gt;&lt;p&gt;Reflecting on the arc of our careers, I cannot help but smile at the realization that the ‘old’ professor who once seemed so far ahead of me is, in fact, only a few years my senior. Time has a way of leveling us, and today I count Age as not only a colleague but also a dear friend and peer. His wisdom, humility, and warmth continue to inspire, and his legacy will undoubtedly endure through the countless students, collaborators, and researchers he has influenced.&lt;/p&gt;&lt;p&gt;This special issue is a testam","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 7","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.70052","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144589613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accurate and Rational Collision Cross Section Prediction Using Voxel-Projected Area and Deep Learning 基于体素投影面积和深度学习的准确、合理的碰撞截面预测
IF 2.1 4区 化学 Q1 SOCIAL WORK Pub Date : 2025-07-08 DOI: 10.1002/cem.70040
Jiongyu Wang, Yuxuan Liao, Ting Xie, Ruixi Chen, Jiahui Lai, Zhimin Zhang, Hongmei Lu

Ion mobility spectrometry–mass spectrometry (IMS-MS) enables rapid acquisition of collision cross section (CCS), a critical physicochemical property for analyte characterization. Despite CCS being theoretically defined as the rotationally averaged projected area of 3D atomic spheres, existing models have underutilized this geometric insight. Here, we present a projected area–based CCS prediction method (PACCS). It integrates voxel-projected area approximation, graph neural network (GNN)–extracted features, and m/z to achieve accurate and rational CCS prediction. A voxel-based algorithm efficiently calculates molecular projected areas by leveraging Fibonacci grids sampling and discretizing 3D conformers into voxel grids. PACCS demonstrates exceptional performance, achieving a median relative error (MedRE) of 1.03% and a coefficient of determination (R2) of 0.994 on the test set. External test set against AllCCS2, GraphCCS, SigmaCCS, CCSbase, and DeepCCS highlights the superiority of PACCS, with 80.1% of predictions exhibiting < 3% error. Notably, PACCS exhibits broad applicability across diverse molecular types, including environmental contaminants (R2 = 0.954–0.979) and structurally complex phycotoxins (R2 = 0.961), highlighting the superiority of PACCS in robustness and versatility. Computational efficiency is enhanced via parallelization, enabling large-scale CCS database generation (e.g., 5.9 million entries for ChEMBL within 10 h). Ablation studies confirm the pivotal role of voxel-projected areas (Pearson correlation coefficients > 0.988), while stability analyses reveal minimal sensitivity to conformational variability (standard deviation of R2 is 0.00003). PACCS provides an open-source, scalable solution for expanding CCS databases, advancing compound identification in metabolomics and environmental analysis.

离子迁移谱-质谱(IMS-MS)可以快速获取碰撞截面(CCS),这是分析物表征的关键物理化学性质。尽管CCS在理论上被定义为三维原子球体的旋转平均投影面积,但现有的模型并没有充分利用这种几何洞察力。本文提出了一种基于投影区域的CCS预测方法(PACCS)。结合体素投影面积逼近、图神经网络(GNN)提取特征和m/z,实现准确合理的CCS预测。基于体素的算法通过利用斐波那契网格采样和离散三维构象到体素网格有效地计算分子投影区域。PACCS表现出优异的性能,在测试集上的中位相对误差(MedRE)为1.03%,决定系数(R2)为0.994。针对AllCCS2、GraphCCS、SigmaCCS、CCSbase和DeepCCS的外部测试集突出了PACCS的优势,80.1%的预测显示出<; 3%的误差。值得注意的是,PACCS在不同的分子类型中表现出广泛的适用性,包括环境污染物(R2 = 0.954-0.979)和结构复杂的藻毒素(R2 = 0.961),这突出了PACCS在稳健性和通用性方面的优势。通过并行化提高了计算效率,实现了大规模的CCS数据库生成(例如,在10小时内为ChEMBL生成590万个条目)。消融研究证实了体素投影区域的关键作用(Pearson相关系数>; 0.988),而稳定性分析显示对构象变异性的敏感性最小(R2的标准差为0.00003)。PACCS提供了一个开源的、可扩展的解决方案,用于扩展CCS数据库,推进代谢组学和环境分析中的化合物鉴定。
{"title":"Accurate and Rational Collision Cross Section Prediction Using Voxel-Projected Area and Deep Learning","authors":"Jiongyu Wang,&nbsp;Yuxuan Liao,&nbsp;Ting Xie,&nbsp;Ruixi Chen,&nbsp;Jiahui Lai,&nbsp;Zhimin Zhang,&nbsp;Hongmei Lu","doi":"10.1002/cem.70040","DOIUrl":"10.1002/cem.70040","url":null,"abstract":"<div>\u0000 \u0000 <p>Ion mobility spectrometry–mass spectrometry (IMS-MS) enables rapid acquisition of collision cross section (CCS), a critical physicochemical property for analyte characterization. Despite CCS being theoretically defined as the rotationally averaged projected area of 3D atomic spheres, existing models have underutilized this geometric insight. Here, we present a projected area–based CCS prediction method (PACCS). It integrates voxel-projected area approximation, graph neural network (GNN)–extracted features, and <i>m/z</i> to achieve accurate and rational CCS prediction. A voxel-based algorithm efficiently calculates molecular projected areas by leveraging Fibonacci grids sampling and discretizing 3D conformers into voxel grids. PACCS demonstrates exceptional performance, achieving a median relative error (MedRE) of 1.03% and a coefficient of determination (<i>R</i><sup>2</sup>) of 0.994 on the test set. External test set against AllCCS2, GraphCCS, SigmaCCS, CCSbase, and DeepCCS highlights the superiority of PACCS, with 80.1% of predictions exhibiting &lt; 3% error. Notably, PACCS exhibits broad applicability across diverse molecular types, including environmental contaminants (<i>R</i><sup>2</sup> = 0.954–0.979) and structurally complex phycotoxins (<i>R</i><sup>2</sup> = 0.961), highlighting the superiority of PACCS in robustness and versatility. Computational efficiency is enhanced via parallelization, enabling large-scale CCS database generation (e.g., 5.9 million entries for ChEMBL within 10 h). Ablation studies confirm the pivotal role of voxel-projected areas (Pearson correlation coefficients &gt; 0.988), while stability analyses reveal minimal sensitivity to conformational variability (standard deviation of <i>R</i><sup>2</sup> is 0.00003). PACCS provides an open-source, scalable solution for expanding CCS databases, advancing compound identification in metabolomics and environmental analysis.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 7","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144574152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Frequency-Domain Alignment of Heterogeneous, Multidimensional Separations Data Through Complex Orthogonal Procrustes Analysis 基于复正交Procrustes分析的异构、多维分离数据频域对齐
IF 2.1 4区 化学 Q1 SOCIAL WORK Pub Date : 2025-07-07 DOI: 10.1002/cem.70042
Michael Sorochan Armstrong

Multidimensional separations data have the capacity to reveal detailed information about complex biological samples. However, data analysis has been an ongoing challenge in the area because the peaks that represent chemical factors may drift over the course of several analytical runs along the first- and second-dimension retention times. This makes higher level analyses of the data difficult, because a 1–1 comparison of samples is seldom possible without sophisticated preprocessing routines. This work offers a very simple solution to the alignment problem through an orthogonal Procrustes analysis of the frequency-domain representation of the data, which for each coefficient relative drift and amplitude are represented as a complex number. Its performance on synthetically generated data presenting nonlinear retention distortions is evaluated, in addition to its applicability to quantitative problems using experimental calibration, and untargeted metabolomics data. This analysis is extremely simple and can be recreated using just a few lines of code, relying only on fast algorithms for matrix multiplication and Fourier transforms.

多维分离数据有能力揭示复杂生物样品的详细信息。然而,数据分析一直是该领域的一个挑战,因为代表化学因素的峰值可能会在沿一维和二维保留时间的几次分析运行过程中漂移。这使得对数据进行更高层次的分析变得困难,因为如果没有复杂的预处理程序,很少可能对样本进行1-1比较。这项工作通过对数据的频域表示进行正交Procrustes分析,为校准问题提供了一个非常简单的解决方案,其中每个系数的相对漂移和幅度都表示为复数。除了对使用实验校准的定量问题和非靶向代谢组学数据的适用性外,还评估了其在呈现非线性保留扭曲的综合生成数据上的性能。这个分析非常简单,只需几行代码就可以重新创建,只依赖于矩阵乘法和傅里叶变换的快速算法。
{"title":"Frequency-Domain Alignment of Heterogeneous, Multidimensional Separations Data Through Complex Orthogonal Procrustes Analysis","authors":"Michael Sorochan Armstrong","doi":"10.1002/cem.70042","DOIUrl":"10.1002/cem.70042","url":null,"abstract":"<p>Multidimensional separations data have the capacity to reveal detailed information about complex biological samples. However, data analysis has been an ongoing challenge in the area because the peaks that represent chemical factors may drift over the course of several analytical runs along the first- and second-dimension retention times. This makes higher level analyses of the data difficult, because a 1–1 comparison of samples is seldom possible without sophisticated preprocessing routines. This work offers a very simple solution to the alignment problem through an orthogonal Procrustes analysis of the frequency-domain representation of the data, which for each coefficient relative drift and amplitude are represented as a complex number. Its performance on synthetically generated data presenting nonlinear retention distortions is evaluated, in addition to its applicability to quantitative problems using experimental calibration, and untargeted metabolomics data. This analysis is extremely simple and can be recreated using just a few lines of code, relying only on fast algorithms for matrix multiplication and Fourier transforms.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 7","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.70042","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144573560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-Optimizing Radial Basis Function Support Vector Classifier (SO-RBFSVC) 自优化径向基函数支持向量分类器SO-RBFSVC
IF 2.1 4区 化学 Q1 SOCIAL WORK Pub Date : 2025-05-26 DOI: 10.1002/cem.70038
Qudus Ayodeji Thanni, Peter de Boves Harrington

Support vector classifiers (SVCs) typically use radial basis function (RBF) kernels to map data into higher dimensional spaces that may improve the linear separation of otherwise nonseparable classes. We present a novel self-optimizing radial basis function support vector classifier (SO-RBFSVC) that integrates response surface methodology (RSM), two-dimensional cubic spline interpolation, and bootstrapped Latin partitions (BLPs) for automated hyperparameter tuning. The SO-RBFSVC simultaneously optimizes the RBF kernel width (σ) and cost parameter (C) using an interpolated response surface obtained from generalized prediction accuracies. The SO-RBFSVC was compared to other self-optimizing classifiers (super SVC [sSVC] and super partial least squares discriminant analysis [sPLS-DA]). Four datasets were evaluated: (i) hemp and marijuana discrimination using proton nuclear magnetic resonance spectra, (ii) barley growth location using near-infrared spectra, (iii) glass-type identification based on elemental composition, and (iv) wine cultivar classification from physicochemical properties. External validation results showed that SO-RBFSVC performed comparably to the other models, achieving error rates of 0.4 ± 0.5% for hemp/marijuana, 7 ± 1% for glass, and 6 ± 1% for wine, while outperforming the linear models with 10 ± 1% error for the barley NIR data. For the first time, generalized sensitivity analysis (GSA) was applied to quantify model linearity. GSA revealed high nonlinearity in the barley dataset, justifying a nonlinear model. The SO-RBFSVC provides robust, automated classifier tuning for low- and high-dimensional datasets, offering ease of use.

支持向量分类器(SVCs)通常使用径向基函数(RBF)核将数据映射到高维空间,这可能会改善不可分类的线性分离。我们提出了一种新的自优化径向基函数支持向量分类器(SO-RBFSVC),它集成了响应面方法(RSM)、二维三次样条插值和用于自动超参数调谐的自引导拉丁分区(blp)。SO-RBFSVC利用广义预测精度得到的插值响应面同时优化RBF核宽度(σ)和代价参数(C)。将SO-RBFSVC与其他自优化分类器(超级SVC [sSVC]和超级偏最小二乘判别分析[sPLS-DA])进行比较。对4个数据集进行了评估:(i)利用质子核磁共振光谱对大麻和大麻进行鉴别,(ii)利用近红外光谱对大麦生长位置进行鉴别,(iii)利用元素组成对玻璃类型进行鉴别,以及(iv)利用理化性质对葡萄酒品种进行分类。外部验证结果表明,SO-RBFSVC与其他模型相比,大麻/大麻的误差率为0.4±0.5%,玻璃的误差率为7±1%,葡萄酒的误差率为6±1%,而大麦近红外数据的误差率为10±1%,优于线性模型。首次将广义灵敏度分析(GSA)用于模型线性度的量化。GSA揭示了大麦数据集中的高度非线性,证明了非线性模型的合理性。SO-RBFSVC为低维和高维数据集提供鲁棒的自动分类器调优,易于使用。
{"title":"Self-Optimizing Radial Basis Function Support Vector Classifier (SO-RBFSVC)","authors":"Qudus Ayodeji Thanni,&nbsp;Peter de Boves Harrington","doi":"10.1002/cem.70038","DOIUrl":"10.1002/cem.70038","url":null,"abstract":"<p>Support vector classifiers (SVCs) typically use radial basis function (RBF) kernels to map data into higher dimensional spaces that may improve the linear separation of otherwise nonseparable classes. We present a novel self-optimizing radial basis function support vector classifier (SO-RBFSVC) that integrates response surface methodology (RSM), two-dimensional cubic spline interpolation, and bootstrapped Latin partitions (BLPs) for automated hyperparameter tuning. The SO-RBFSVC simultaneously optimizes the RBF kernel width (<i>σ</i>) and cost parameter (<i>C</i>) using an interpolated response surface obtained from generalized prediction accuracies. The SO-RBFSVC was compared to other self-optimizing classifiers (super SVC [sSVC] and super partial least squares discriminant analysis [sPLS-DA]). Four datasets were evaluated: (i) hemp and marijuana discrimination using proton nuclear magnetic resonance spectra, (ii) barley growth location using near-infrared spectra, (iii) glass-type identification based on elemental composition, and (iv) wine cultivar classification from physicochemical properties. External validation results showed that SO-RBFSVC performed comparably to the other models, achieving error rates of 0.4 ± 0.5% for hemp/marijuana, 7 ± 1% for glass, and 6 ± 1% for wine, while outperforming the linear models with 10 ± 1% error for the barley NIR data. For the first time, generalized sensitivity analysis (GSA) was applied to quantify model linearity. GSA revealed high nonlinearity in the barley dataset, justifying a nonlinear model. The SO-RBFSVC provides robust, automated classifier tuning for low- and high-dimensional datasets, offering ease of use.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 6","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.70038","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144140398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How Are Chemometric Models Validated? A Systematic Review of Linear Regression Models for NIRS Data in Food Analysis 如何验证化学计量学模型?食品近红外光谱分析数据线性回归模型的系统综述
IF 2.1 4区 化学 Q1 SOCIAL WORK Pub Date : 2025-05-19 DOI: 10.1002/cem.70036
Jokin Ezenarro, Daniel Schorn-García

Chemometric models play a critical role in the spectroscopic analysis of food, particularly with near-infrared spectroscopy (NIRS), enabling the accurate prediction and monitoring of physicochemical properties. Although chemometric methods have proven to be useful tools in NIRS analysis, their reliability depends on rigorous validation to ensure the rigour of their predictions and their applicability. This systematic review examines validation strategies applied to regression models in NIRS-based food analysis, emphasising the use of cross-validation, external validation and figures of merit (FoM) as key evaluation tools. This comprehensive literature search identified trends in validation methodologies, highlighting frequent reliance on partial least squares (PLS) regression and common flaws in validation methodologies and their reporting. While external validation is considered the best approach, many studies lack it and employ cross-validation methods solely, which may lead to overoptimistic model performance estimates. Furthermore, inconsistencies in the selection and definition of FoM hinder direct comparison across studies. This review underscores the need for increased methodological transparency and rigour in the validation of chemometric models to enhance their reliability.

化学计量学模型在食品的光谱分析中起着至关重要的作用,特别是近红外光谱(NIRS),可以准确预测和监测食品的理化性质。虽然化学计量学方法已被证明是近红外光谱分析的有用工具,但其可靠性取决于严格的验证,以确保其预测的严谨性和适用性。本系统综述研究了在基于nir的食品分析中应用于回归模型的验证策略,强调交叉验证、外部验证和价值图(FoM)作为关键评估工具的使用。这项全面的文献检索确定了验证方法的趋势,突出了对偏最小二乘(PLS)回归的频繁依赖以及验证方法及其报告中的常见缺陷。虽然外部验证被认为是最好的方法,但许多研究缺乏外部验证,只采用交叉验证方法,这可能导致模型性能估计过于乐观。此外,FoM的选择和定义的不一致性阻碍了研究之间的直接比较。这篇综述强调了在化学计量模型验证中增加方法透明度和严谨性以提高其可靠性的必要性。
{"title":"How Are Chemometric Models Validated? A Systematic Review of Linear Regression Models for NIRS Data in Food Analysis","authors":"Jokin Ezenarro,&nbsp;Daniel Schorn-García","doi":"10.1002/cem.70036","DOIUrl":"10.1002/cem.70036","url":null,"abstract":"<p>Chemometric models play a critical role in the spectroscopic analysis of food, particularly with near-infrared spectroscopy (NIRS), enabling the accurate prediction and monitoring of physicochemical properties. Although chemometric methods have proven to be useful tools in NIRS analysis, their reliability depends on rigorous validation to ensure the rigour of their predictions and their applicability. This systematic review examines validation strategies applied to regression models in NIRS-based food analysis, emphasising the use of cross-validation, external validation and figures of merit (FoM) as key evaluation tools. This comprehensive literature search identified trends in validation methodologies, highlighting frequent reliance on partial least squares (PLS) regression and common flaws in validation methodologies and their reporting. While external validation is considered the best approach, many studies lack it and employ cross-validation methods solely, which may lead to overoptimistic model performance estimates. Furthermore, inconsistencies in the selection and definition of FoM hinder direct comparison across studies. This review underscores the need for increased methodological transparency and rigour in the validation of chemometric models to enhance their reliability.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 6","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.70036","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144085154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
De Novo Design of HIV-1 Integrase-LEDGF/p75 Inhibitors Through Deep Reinforcement Learning and Virtual Screening 基于深度强化学习和虚拟筛选的HIV-1整合酶- ledgf /p75抑制剂从头设计
IF 2.1 4区 化学 Q1 SOCIAL WORK Pub Date : 2025-05-12 DOI: 10.1002/cem.70037
Hai-Bo Sun, Hai-Long Wu, Tong Wang, An-Qi Chen, Ru-Qin Yu

Human immunodeficiency virus (HIV) has far-reaching impacts on global public health. Acquired immunodeficiency syndrome (AIDS) has caused millions of deaths globally, with thousands still getting infected. Therefore, developing HIV-1 integrase inhibitors is crucial for controlling AIDS by slowing virus replication and transmission. This study is grounded in the framework of deep reinforcement learning, aiming to de novo design inhibitors of HIV-1 integrase-Lens Epithelial-Derived Growth Factor/p75 interaction and subsequently employing molecular docking to screen potential therapeutic compounds. Initially, a molecular generation model was established based on the long short-term memory algorithm and refined through transfer learning to obtain a preliminary generative model. Subsequently, the deep reinforcement learning strategy was employed, using inhibition activity as a reward value, enabling the model more likely to generate molecules with desirable properties. The results indicate that the reinforced generation model not only generates novel and effective SMILES structures with medicinal potential but also demonstrates strong binding affinity between the generated molecules and the target protein, as indicated by molecular docking experiments. Ultimately, through virtual screening, we identified six lead compounds having the potential to become inhibitors of interaction between Lens Epithelial-Derived Growth Factor/p75 and HIV-1 integrase, providing an effective and practical strategy for de novo drug design of HIV-1 integrase inhibitors.

人类免疫缺陷病毒(HIV)对全球公共卫生产生深远影响。获得性免疫缺陷综合症(艾滋病)已在全球造成数百万人死亡,仍有数千人受到感染。因此,开发HIV-1整合酶抑制剂对于通过减缓病毒复制和传播来控制艾滋病至关重要。本研究基于深度强化学习的框架,旨在重新设计HIV-1整合酶-晶状体上皮衍生生长因子/p75相互作用的抑制剂,并随后采用分子对接来筛选潜在的治疗化合物。首先,基于长短期记忆算法建立分子生成模型,并通过迁移学习进行细化,得到初步的生成模型。随后,采用深度强化学习策略,使用抑制活性作为奖励值,使模型更有可能生成具有理想特性的分子。结果表明,通过分子对接实验,增强生成模型不仅生成了具有药用潜力的新颖有效的smile结构,而且生成的分子与靶蛋白之间具有较强的结合亲和力。最终,通过虚拟筛选,我们确定了六种先导化合物,它们有可能成为晶状体上皮衍生生长因子/p75与HIV-1整合酶之间相互作用的抑制剂,为HIV-1整合酶抑制剂的新药物设计提供了有效和实用的策略。
{"title":"De Novo Design of HIV-1 Integrase-LEDGF/p75 Inhibitors Through Deep Reinforcement Learning and Virtual Screening","authors":"Hai-Bo Sun,&nbsp;Hai-Long Wu,&nbsp;Tong Wang,&nbsp;An-Qi Chen,&nbsp;Ru-Qin Yu","doi":"10.1002/cem.70037","DOIUrl":"10.1002/cem.70037","url":null,"abstract":"<div>\u0000 \u0000 <p>Human immunodeficiency virus (HIV) has far-reaching impacts on global public health. Acquired immunodeficiency syndrome (AIDS) has caused millions of deaths globally, with thousands still getting infected. Therefore, developing HIV-1 integrase inhibitors is crucial for controlling AIDS by slowing virus replication and transmission. This study is grounded in the framework of deep reinforcement learning, aiming to de novo design inhibitors of HIV-1 integrase-Lens Epithelial-Derived Growth Factor/p75 interaction and subsequently employing molecular docking to screen potential therapeutic compounds. Initially, a molecular generation model was established based on the long short-term memory algorithm and refined through transfer learning to obtain a preliminary generative model. Subsequently, the deep reinforcement learning strategy was employed, using inhibition activity as a reward value, enabling the model more likely to generate molecules with desirable properties. The results indicate that the reinforced generation model not only generates novel and effective SMILES structures with medicinal potential but also demonstrates strong binding affinity between the generated molecules and the target protein, as indicated by molecular docking experiments. Ultimately, through virtual screening, we identified six lead compounds having the potential to become inhibitors of interaction between Lens Epithelial-Derived Growth Factor/p75 and HIV-1 integrase, providing an effective and practical strategy for de novo drug design of HIV-1 integrase inhibitors.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 5","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143939411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Chemometrics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1