QSPRpred: a Flexible Open-Source Quantitative Structure-Property Relationship Modelling Tool

IF 7.1 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Journal of Cheminformatics Pub Date : 2024-11-14 DOI:10.1186/s13321-024-00908-y
Helle W. van den Maagdenberg, Martin Šícho, David Alencar Araripe, Sohvi Luukkonen, Linde Schoenmaker, Michiel Jespers, Olivier J. M. Béquignon, Marina Gorostiola González, Remco L. van den Broek, Andrius Bernatavicius, J. G. Coen van Hasselt, Piet. H. van der Graaf, Gerard J. P. van Westen
{"title":"QSPRpred: a Flexible Open-Source Quantitative Structure-Property Relationship Modelling Tool","authors":"Helle W. van den Maagdenberg,&nbsp;Martin Šícho,&nbsp;David Alencar Araripe,&nbsp;Sohvi Luukkonen,&nbsp;Linde Schoenmaker,&nbsp;Michiel Jespers,&nbsp;Olivier J. M. Béquignon,&nbsp;Marina Gorostiola González,&nbsp;Remco L. van den Broek,&nbsp;Andrius Bernatavicius,&nbsp;J. G. Coen van Hasselt,&nbsp;Piet. H. van der Graaf,&nbsp;Gerard J. P. van Westen","doi":"10.1186/s13321-024-00908-y","DOIUrl":null,"url":null,"abstract":"<div><p>Building reliable and robust quantitative structure–property relationship (QSPR) models is a challenging task. First, the experimental data needs to be obtained, analyzed and curated. Second, the number of available methods is continuously growing and evaluating different algorithms and methodologies can be arduous. Finally, the last hurdle that researchers face is to ensure the reproducibility of their models and facilitate their transferability into practice. In this work, we introduce QSPRpred, a toolkit for analysis of bioactivity data sets and QSPR modelling, which attempts to address the aforementioned challenges. QSPRpred’s modular Python API enables users to intuitively describe different parts of a modelling workflow using a plethora of pre-implemented components, but also integrates customized implementations in a “plug-and-play” manner. QSPRpred data sets and models are directly serializable, which means they can be readily reproduced and put into operation after training as the models are saved with all required data pre-processing steps to make predictions on new compounds directly from SMILES strings. The general-purpose character of QSPRpred is also demonstrated by inclusion of support for multi-task and proteochemometric modelling. The package is extensively documented and comes with a large collection of tutorials to help new users. In this paper, we describe all of QSPRpred’s functionalities and also conduct a small benchmarking case study to illustrate how different components can be leveraged to compare a diverse set of models. QSPRpred is fully open-source and available at https://github.com/CDDLeiden/QSPRpred.</p><br><p><b>Scientific Contribution</b></p><p>QSPRpred aims to provide a complex, but comprehensive Python API to conduct all tasks encountered in QSPR modelling from data preparation and analysis to model creation and model deployment. In contrast to similar packages, QSPRpred offers a wider and more exhaustive range of capabilities and integrations with many popular packages that also go beyond QSPR modelling. A significant contribution of QSPRpred is also in its automated and highly standardized serialization scheme, which significantly improves reproducibility and transferability of models.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00908-y","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-024-00908-y","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Building reliable and robust quantitative structure–property relationship (QSPR) models is a challenging task. First, the experimental data needs to be obtained, analyzed and curated. Second, the number of available methods is continuously growing and evaluating different algorithms and methodologies can be arduous. Finally, the last hurdle that researchers face is to ensure the reproducibility of their models and facilitate their transferability into practice. In this work, we introduce QSPRpred, a toolkit for analysis of bioactivity data sets and QSPR modelling, which attempts to address the aforementioned challenges. QSPRpred’s modular Python API enables users to intuitively describe different parts of a modelling workflow using a plethora of pre-implemented components, but also integrates customized implementations in a “plug-and-play” manner. QSPRpred data sets and models are directly serializable, which means they can be readily reproduced and put into operation after training as the models are saved with all required data pre-processing steps to make predictions on new compounds directly from SMILES strings. The general-purpose character of QSPRpred is also demonstrated by inclusion of support for multi-task and proteochemometric modelling. The package is extensively documented and comes with a large collection of tutorials to help new users. In this paper, we describe all of QSPRpred’s functionalities and also conduct a small benchmarking case study to illustrate how different components can be leveraged to compare a diverse set of models. QSPRpred is fully open-source and available at https://github.com/CDDLeiden/QSPRpred.


Scientific Contribution

QSPRpred aims to provide a complex, but comprehensive Python API to conduct all tasks encountered in QSPR modelling from data preparation and analysis to model creation and model deployment. In contrast to similar packages, QSPRpred offers a wider and more exhaustive range of capabilities and integrations with many popular packages that also go beyond QSPR modelling. A significant contribution of QSPRpred is also in its automated and highly standardized serialization scheme, which significantly improves reproducibility and transferability of models.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
QSPRpred:灵活的开源定量结构-属性关系建模工具。
建立可靠、稳健的定量结构-性能关系(QSPR)模型是一项具有挑战性的任务。首先,需要获取、分析和整理实验数据。其次,可用方法的数量在不断增加,评估不同的算法和方法可能非常困难。最后,研究人员面临的最后一个障碍是确保其模型的可重复性,并促进其在实践中的可移植性。在这项工作中,我们介绍了用于分析生物活性数据集和 QSPR 建模的工具包 QSPRpred,它试图解决上述挑战。QSPRpred 的模块化 Python 应用程序接口(API)使用户能够使用大量预实现组件直观地描述建模工作流程的不同部分,同时还能以 "即插即用 "的方式集成自定义实现。QSPRpred 数据集和模型可直接序列化,这意味着它们可以随时复制,并在训练后投入使用,因为模型与所有必要的数据预处理步骤一起保存,可直接从 SMILES 字符串对新化合物进行预测。QSPRpred 的通用性还体现在支持多任务和蛋白质化学计量建模。该软件包有大量文档,并附有大量教程,可为新用户提供帮助。在本文中,我们介绍了 QSPRpred 的所有功能,还进行了一个小型基准案例研究,以说明如何利用不同组件来比较各种模型。QSPRpred 是完全开源的,可从 https://github.com/CDDLeiden/QSPRpred 上获取。科学贡献QSPRpred 旨在提供一个复杂但全面的 Python 应用程序接口,以执行 QSPR 建模中遇到的所有任务,从数据准备和分析到模型创建和模型部署。与同类软件包相比,QSPRpred 提供了更广泛、更详尽的功能,并与许多流行的软件包集成,这些功能也超出了 QSPR 建模的范围。QSPRpred 的一个重要贡献还在于其自动化和高度标准化的序列化方案,这大大提高了模型的可复制性和可移植性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Cheminformatics
Journal of Cheminformatics CHEMISTRY, MULTIDISCIPLINARY-COMPUTER SCIENCE, INFORMATION SYSTEMS
CiteScore
14.10
自引率
7.00%
发文量
82
审稿时长
3 months
期刊介绍: Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling. Coverage includes, but is not limited to: chemical information systems, software and databases, and molecular modelling, chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases, computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.
期刊最新文献
One size does not fit all: revising traditional paradigms for assessing accuracy of QSAR models used for virtual screening Chemical space as a unifying theme for chemistry Context-dependent similarity analysis of analogue series for structure–activity relationship transfer based on a concept from natural language processing Fragmenstein: predicting protein–ligand structures of compounds derived from known crystallographic fragment hits using a strict conserved-binding–based methodology ADMET evaluation in drug discovery: 21. Application and industrial validation of machine learning algorithms for Caco-2 permeability prediction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1