Deepmol: an automated machine and deep learning framework for computational chemistry

IF 5.7 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Journal of Cheminformatics Pub Date : 2024-12-05 DOI:10.1186/s13321-024-00937-7
João Correia, João Capela, Miguel Rocha
{"title":"Deepmol: an automated machine and deep learning framework for computational chemistry","authors":"João Correia,&nbsp;João Capela,&nbsp;Miguel Rocha","doi":"10.1186/s13321-024-00937-7","DOIUrl":null,"url":null,"abstract":"<div><p>The domain of computational chemistry has experienced a significant evolution due to the introduction of Machine Learning (ML) technologies. Despite its potential to revolutionize the field, researchers are often encumbered by obstacles, such as the complexity of selecting optimal algorithms, the automation of data pre-processing steps, the necessity for adaptive feature engineering, and the assurance of model performance consistency across different datasets. Addressing these issues head-on, <i>DeepMol</i> stands out as an Automated ML (AutoML) tool by automating critical steps of the ML pipeline. <i>DeepMol</i> rapidly and automatically identifies the most effective data representation, pre-processing methods and model configurations for a specific molecular property/activity prediction problem. On 22 benchmark datasets, <i>DeepMol</i> obtained competitive pipelines compared with those requiring time-consuming feature engineering, model design and selection processes. As one of the first AutoML tools specifically developed for the computational chemistry domain, <i>DeepMol</i> stands out with its open-source code, in-depth tutorials, detailed documentation, and examples of real-world applications, all available at https://github.com/BioSystemsUM/DeepMol and https://deepmol.readthedocs.io/en/latest/. By introducing AutoML as a groundbreaking feature in computational chemistry, DeepMol establishes itself as the pioneering state-of-the-art tool in the field.</p><p><b>Scientific contribution</b></p><p><i>DeepMol</i> aims to provide an integrated framework of AutoML for computational chemistry. <i>DeepMol</i> provides a more robust alternative to other tools with its integrated pipeline serialization, enabling seamless deployment using the <i>fit</i>, <i>transform</i>, and <i>predict</i> paradigms. It uniquely supports both conventional and deep learning models for regression, classification and multi-task, offering unmatched flexibility compared to other AutoML tools. <i>DeepMol's</i> predefined configurations and customizable objective functions make it accessible to users at all skill levels while enabling efficient and reproducible workflows. Benchmarking on diverse datasets demonstrated its ability to deliver optimized pipelines and superior performance across various molecular machine-learning tasks.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":5.7000,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00937-7","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-024-00937-7","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

The domain of computational chemistry has experienced a significant evolution due to the introduction of Machine Learning (ML) technologies. Despite its potential to revolutionize the field, researchers are often encumbered by obstacles, such as the complexity of selecting optimal algorithms, the automation of data pre-processing steps, the necessity for adaptive feature engineering, and the assurance of model performance consistency across different datasets. Addressing these issues head-on, DeepMol stands out as an Automated ML (AutoML) tool by automating critical steps of the ML pipeline. DeepMol rapidly and automatically identifies the most effective data representation, pre-processing methods and model configurations for a specific molecular property/activity prediction problem. On 22 benchmark datasets, DeepMol obtained competitive pipelines compared with those requiring time-consuming feature engineering, model design and selection processes. As one of the first AutoML tools specifically developed for the computational chemistry domain, DeepMol stands out with its open-source code, in-depth tutorials, detailed documentation, and examples of real-world applications, all available at https://github.com/BioSystemsUM/DeepMol and https://deepmol.readthedocs.io/en/latest/. By introducing AutoML as a groundbreaking feature in computational chemistry, DeepMol establishes itself as the pioneering state-of-the-art tool in the field.

Scientific contribution

DeepMol aims to provide an integrated framework of AutoML for computational chemistry. DeepMol provides a more robust alternative to other tools with its integrated pipeline serialization, enabling seamless deployment using the fit, transform, and predict paradigms. It uniquely supports both conventional and deep learning models for regression, classification and multi-task, offering unmatched flexibility compared to other AutoML tools. DeepMol's predefined configurations and customizable objective functions make it accessible to users at all skill levels while enabling efficient and reproducible workflows. Benchmarking on diverse datasets demonstrated its ability to deliver optimized pipelines and superior performance across various molecular machine-learning tasks.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Deepmol:用于计算化学的自动化机器和深度学习框架
由于机器学习(ML)技术的引入,计算化学领域经历了重大的发展。尽管它有可能彻底改变该领域,但研究人员经常受到障碍的阻碍,例如选择最佳算法的复杂性,数据预处理步骤的自动化,自适应特征工程的必要性,以及不同数据集之间模型性能一致性的保证。为了正面解决这些问题,DeepMol通过自动化机器学习管道的关键步骤,作为一款自动化机器学习(AutoML)工具脱颖而出。DeepMol可以快速、自动地识别最有效的数据表示、预处理方法和模型配置,用于特定的分子性质/活性预测问题。与耗时的特征工程、模型设计和选择过程相比,DeepMol在22个基准数据集上获得了具有竞争力的管道。作为第一个专门为计算化学领域开发的AutoML工具之一,DeepMol以其开源代码,深入教程,详细文档和实际应用示例而脱颖而出,所有这些都可以在https://github.com/BioSystemsUM/DeepMol和https://deepmol.readthedocs.io/en/latest/上获得。通过引入AutoML作为计算化学领域的开创性功能,DeepMol将自己确立为该领域开创性的最先进工具。DeepMol旨在为计算化学提供一个集成的AutoML框架。DeepMol通过其集成的管道序列化提供了其他工具更强大的替代方案,可以使用fit、transform和predict范式实现无缝部署。它独特地支持传统和深度学习模型,用于回归、分类和多任务,与其他AutoML工具相比,提供了无与伦比的灵活性。DeepMol的预定义配置和可定制的目标函数使所有技能水平的用户都可以访问它,同时实现高效和可重复的工作流程。对不同数据集的基准测试表明,它能够在各种分子机器学习任务中提供优化的管道和卓越的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Cheminformatics
Journal of Cheminformatics CHEMISTRY, MULTIDISCIPLINARY-COMPUTER SCIENCE, INFORMATION SYSTEMS
CiteScore
14.10
自引率
7.00%
发文量
82
审稿时长
3 months
期刊介绍: Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling. Coverage includes, but is not limited to: chemical information systems, software and databases, and molecular modelling, chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases, computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.
期刊最新文献
Contrastive representation learning and capsule networks enable accurate identification of ferroptosis-related proteins. Correction: Integrating artificial intelligence and manual curation to enhance bioassay annotations in ChEMBL Perspective on applicability of data-driven machine learning computational new approach methodologies for hazard identification in chemicals risk assessment. Predicting toxicity and bioactivity of the chemical exposome: a case study for the blood exposome database. A light-weight Graph Neural Network for the prediction of 31P Nuclear Magnetic Resonance signals.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1