Natural products subsets: Generation and characterization

Ana L. Chávez-Hernández, José L. Medina-Franco
{"title":"Natural products subsets: Generation and characterization","authors":"Ana L. Chávez-Hernández,&nbsp;José L. Medina-Franco","doi":"10.1016/j.ailsci.2023.100066","DOIUrl":null,"url":null,"abstract":"<div><p>Natural products are attractive for drug discovery applications because of their distinctive chemical structures, such as an overall large fraction of sp<sup>3</sup> carbon atoms, chiral centers (both features associated with structural complexity), large chemical scaffolds, and diversity of functional groups. Furthermore, natural products are used in <em>de novo</em> design and have inspired the development of pseudo-natural products using generative models. Public databases such as the Collection of Open NatUral ProdUcTs and the Universal Natural Product database (UNPD) are rich sources of structures to be used in generative models and other applications. In this work, we report the selection and characterization of the most diverse compounds of natural products from the UNPD using the MaxMin algorithm. The subsets generated with 14,994, 7,497, and 4,998 compounds are publicly available at <span>https://github.com/DIFACQUIM/Natural-products-subsets-generation</span><svg><path></path></svg>. We anticipate that the subsets will be particularly useful in building generative models based on natural products by research groups, particularly those with limited access to extensive supercomputer resources.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence in the life sciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667318523000107","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Natural products are attractive for drug discovery applications because of their distinctive chemical structures, such as an overall large fraction of sp3 carbon atoms, chiral centers (both features associated with structural complexity), large chemical scaffolds, and diversity of functional groups. Furthermore, natural products are used in de novo design and have inspired the development of pseudo-natural products using generative models. Public databases such as the Collection of Open NatUral ProdUcTs and the Universal Natural Product database (UNPD) are rich sources of structures to be used in generative models and other applications. In this work, we report the selection and characterization of the most diverse compounds of natural products from the UNPD using the MaxMin algorithm. The subsets generated with 14,994, 7,497, and 4,998 compounds are publicly available at https://github.com/DIFACQUIM/Natural-products-subsets-generation. We anticipate that the subsets will be particularly useful in building generative models based on natural products by research groups, particularly those with limited access to extensive supercomputer resources.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
天然产物子集:生成和表征
天然产物具有独特的化学结构,如大量的sp3碳原子、手性中心(这两个特征都与结构复杂性有关)、大型化学支架和功能基团的多样性,因此对药物发现应用具有吸引力。此外,天然产品被用于从头设计,并激发了使用生成模型的伪天然产品的发展。公共数据库,如开放天然产物集和通用天然产物数据库(UNPD)是生成模型和其他应用中使用的结构的丰富来源。在这项工作中,我们报告了使用MaxMin算法从UNPD中选择和表征最多样化的天然产物化合物。由14,994、7,497和4,998个化合物生成的子集可在https://github.com/DIFACQUIM/Natural-products-subsets-generation上公开获得。我们预计,这些子集将在研究小组建立基于自然产物的生成模型时特别有用,特别是那些无法获得大量超级计算机资源的研究小组。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Artificial intelligence in the life sciences
Artificial intelligence in the life sciences Pharmacology, Biochemistry, Genetics and Molecular Biology (General), Computer Science Applications, Health Informatics, Drug Discovery, Veterinary Science and Veterinary Medicine (General)
CiteScore
5.00
自引率
0.00%
发文量
0
审稿时长
15 days
期刊最新文献
Pharmacological profiles of neglected tropical disease drugs DTA Atlas: A massive-scale drug repurposing database Modeling PROTAC degradation activity with machine learning Machine learning proteochemometric models for Cereblon glue activity predictions Editorial Board
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1