Comprehensive evaluation and systematic comparison of Gaussian process (GP) modelling applications in peptide quantitative structure-activity relationship

IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Chemometrics and Intelligent Laboratory Systems Pub Date : 2024-07-31 DOI:10.1016/j.chemolab.2024.105191
Haiyang Ye, Yunyi Zhang, Zilong Li, Yue Peng, Peng Zhou
{"title":"Comprehensive evaluation and systematic comparison of Gaussian process (GP) modelling applications in peptide quantitative structure-activity relationship","authors":"Haiyang Ye,&nbsp;Yunyi Zhang,&nbsp;Zilong Li,&nbsp;Yue Peng,&nbsp;Peng Zhou","doi":"10.1016/j.chemolab.2024.105191","DOIUrl":null,"url":null,"abstract":"<div><p>Peptide quantitative structure-activity relationship (pQSAR) is a specific extension of traditional QSARs from small-molecule drugs to bioactive peptides. Since peptides are linear biopolymers that are essentially different to small-molecule compounds in terms of their structural features such as ordering sequence, large size and intrinsic flexibility, the pQSAR methodology (including structural characterization and regression modelling) should be further exploited relative to traditional QSARs. Gaussian process (GP) serves as a pioneering Bayesian-based machine learning (ML) solution for tackling linear/nonlinear-hybrid regression issues in intricate domains. However, the applications of GP regression in QSAR and, particularly, the pQSAR still remain largely unexplored to date. In this work, we launched a comprehensive pQSAR study with GP regression modelling, aiming to the deep evaluation of GP performance based on different characterizations and also the systematic comparison of GP with other routine MLs. Here, we culled two distinct classes of peptide datasets, which separately comprise 12 panels of sophisticated benchmarks and 46 panels of extended samples, totally containing 8804 peptide samples and systematically resulting in 522 regression models. Our study indicated that the GP can generally provide an effective solution for many pQSAR problems with the potential to promote ML regression modelling in this area, which is comparable with or even better than those widely used methods on both the sophisticated benchmarks and extended samples. In addition, GP also has many advantages as compared to traditional MLs, such as hyperparameter self-consistency, overfitting resistance, interpretable output and estimable uncertainty.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"252 ","pages":"Article 105191"},"PeriodicalIF":3.7000,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemometrics and Intelligent Laboratory Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S016974392400131X","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Peptide quantitative structure-activity relationship (pQSAR) is a specific extension of traditional QSARs from small-molecule drugs to bioactive peptides. Since peptides are linear biopolymers that are essentially different to small-molecule compounds in terms of their structural features such as ordering sequence, large size and intrinsic flexibility, the pQSAR methodology (including structural characterization and regression modelling) should be further exploited relative to traditional QSARs. Gaussian process (GP) serves as a pioneering Bayesian-based machine learning (ML) solution for tackling linear/nonlinear-hybrid regression issues in intricate domains. However, the applications of GP regression in QSAR and, particularly, the pQSAR still remain largely unexplored to date. In this work, we launched a comprehensive pQSAR study with GP regression modelling, aiming to the deep evaluation of GP performance based on different characterizations and also the systematic comparison of GP with other routine MLs. Here, we culled two distinct classes of peptide datasets, which separately comprise 12 panels of sophisticated benchmarks and 46 panels of extended samples, totally containing 8804 peptide samples and systematically resulting in 522 regression models. Our study indicated that the GP can generally provide an effective solution for many pQSAR problems with the potential to promote ML regression modelling in this area, which is comparable with or even better than those widely used methods on both the sophisticated benchmarks and extended samples. In addition, GP also has many advantages as compared to traditional MLs, such as hyperparameter self-consistency, overfitting resistance, interpretable output and estimable uncertainty.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
多肽定量结构-活性关系中高斯过程(GP)建模应用的综合评估与系统比较
肽定量结构-活性关系(pQSAR)是传统 QSAR 方法从小分子药物到生物活性肽的具体延伸。由于肽是线性生物聚合物,其结构特征(如排序序列、大尺寸和内在灵活性)与小分子化合物有本质区别,因此相对于传统 QSAR,pQSAR 方法(包括结构表征和回归建模)应得到进一步开发。高斯过程(GP)是一种开创性的基于贝叶斯的机器学习(ML)解决方案,用于解决复杂领域的线性/非线性混合回归问题。然而,迄今为止,GP 回归在 QSAR,尤其是 pQSAR 中的应用在很大程度上仍未得到探索。在这项工作中,我们利用 GP 回归建模开展了一项全面的 pQSAR 研究,旨在根据不同的特征对 GP 性能进行深入评估,并将 GP 与其他常规 ML 进行系统比较。在这里,我们选取了两类不同的肽数据集,分别包括 12 组精密基准和 46 组扩展样本,共包含 8804 个肽样本,并系统地生成了 522 个回归模型。我们的研究表明,GP 通常能为许多 pQSAR 问题提供有效的解决方案,具有促进该领域 ML 回归建模的潜力,在复杂基准和扩展样本上与那些广泛使用的方法不相上下,甚至更胜一筹。此外,与传统的 ML 相比,GP 还有很多优势,如超参数自洽性、抗过拟合、可解释的输出和可估计的不确定性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
7.50
自引率
7.70%
发文量
169
审稿时长
3.4 months
期刊介绍: Chemometrics and Intelligent Laboratory Systems publishes original research papers, short communications, reviews, tutorials and Original Software Publications reporting on development of novel statistical, mathematical, or computer techniques in Chemistry and related disciplines. Chemometrics is the chemical discipline that uses mathematical and statistical methods to design or select optimal procedures and experiments, and to provide maximum chemical information by analysing chemical data. The journal deals with the following topics: 1) Development of new statistical, mathematical and chemometrical methods for Chemistry and related fields (Environmental Chemistry, Biochemistry, Toxicology, System Biology, -Omics, etc.) 2) Novel applications of chemometrics to all branches of Chemistry and related fields (typical domains of interest are: process data analysis, experimental design, data mining, signal processing, supervised modelling, decision making, robust statistics, mixture analysis, multivariate calibration etc.) Routine applications of established chemometrical techniques will not be considered. 3) Development of new software that provides novel tools or truly advances the use of chemometrical methods. 4) Well characterized data sets to test performance for the new methods and software. The journal complies with International Committee of Medical Journal Editors'' Uniform requirements for manuscripts.
期刊最新文献
A flame image soft sensor for oxygen content prediction based on denoising diffusion probabilistic model Prediction of potential antitumor components in Ganoderma lucidum: A combined approach using machine learning and molecular docking Spectra data calibration based on deep residual modeling of independent component regression Enhanced CO2 leak detection in soil: High-fidelity digital colorimetry with machine learning and ACES AP0 Quantitative structure properties relationship (QSPR) analysis for physicochemical properties of nonsteroidal anti-inflammatory drugs (NSAIDs) usingVe degree-based reducible topological indices
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1