天然产物子集：生成和表征

Artificial intelligence in the life sciences Pub Date : 2023-02-26 DOI:10.1016/j.ailsci.2023.100066

Ana L. Chávez-Hernández, José L. Medina-Franco

{"title":"天然产物子集：生成和表征","authors":"Ana L. Chávez-Hernández, José L. Medina-Franco","doi":"10.1016/j.ailsci.2023.100066","DOIUrl":null,"url":null,"abstract":"<div><p>Natural products are attractive for drug discovery applications because of their distinctive chemical structures, such as an overall large fraction of sp<sup>3</sup> carbon atoms, chiral centers (both features associated with structural complexity), large chemical scaffolds, and diversity of functional groups. Furthermore, natural products are used in <em>de novo</em> design and have inspired the development of pseudo-natural products using generative models. Public databases such as the Collection of Open NatUral ProdUcTs and the Universal Natural Product database (UNPD) are rich sources of structures to be used in generative models and other applications. In this work, we report the selection and characterization of the most diverse compounds of natural products from the UNPD using the MaxMin algorithm. The subsets generated with 14,994, 7,497, and 4,998 compounds are publicly available at <span>https://github.com/DIFACQUIM/Natural-products-subsets-generation</span><svg><path></path></svg>. We anticipate that the subsets will be particularly useful in building generative models based on natural products by research groups, particularly those with limited access to extensive supercomputer resources.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"3 ","pages":"Article 100066"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Natural products subsets: Generation and characterization\",\"authors\":\"Ana L. Chávez-Hernández, José L. Medina-Franco\",\"doi\":\"10.1016/j.ailsci.2023.100066\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Natural products are attractive for drug discovery applications because of their distinctive chemical structures, such as an overall large fraction of sp<sup>3</sup> carbon atoms, chiral centers (both features associated with structural complexity), large chemical scaffolds, and diversity of functional groups. Furthermore, natural products are used in <em>de novo</em> design and have inspired the development of pseudo-natural products using generative models. Public databases such as the Collection of Open NatUral ProdUcTs and the Universal Natural Product database (UNPD) are rich sources of structures to be used in generative models and other applications. In this work, we report the selection and characterization of the most diverse compounds of natural products from the UNPD using the MaxMin algorithm. The subsets generated with 14,994, 7,497, and 4,998 compounds are publicly available at <span>https://github.com/DIFACQUIM/Natural-products-subsets-generation</span><svg><path></path></svg>. We anticipate that the subsets will be particularly useful in building generative models based on natural products by research groups, particularly those with limited access to extensive supercomputer resources.</p></div>\",\"PeriodicalId\":72304,\"journal\":{\"name\":\"Artificial intelligence in the life sciences\",\"volume\":\"3 \",\"pages\":\"Article 100066\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial intelligence in the life sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2667318523000107\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence in the life sciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667318523000107","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

天然产物具有独特的化学结构，如大量的sp3碳原子、手性中心(这两个特征都与结构复杂性有关)、大型化学支架和功能基团的多样性，因此对药物发现应用具有吸引力。此外，天然产品被用于从头设计，并激发了使用生成模型的伪天然产品的发展。公共数据库，如开放天然产物集和通用天然产物数据库(UNPD)是生成模型和其他应用中使用的结构的丰富来源。在这项工作中，我们报告了使用MaxMin算法从UNPD中选择和表征最多样化的天然产物化合物。由14,994、7,497和4,998个化合物生成的子集可在https://github.com/DIFACQUIM/Natural-products-subsets-generation上公开获得。我们预计，这些子集将在研究小组建立基于自然产物的生成模型时特别有用，特别是那些无法获得大量超级计算机资源的研究小组。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Natural products subsets: Generation and characterization

Natural products are attractive for drug discovery applications because of their distinctive chemical structures, such as an overall large fraction of sp³ carbon atoms, chiral centers (both features associated with structural complexity), large chemical scaffolds, and diversity of functional groups. Furthermore, natural products are used in de novo design and have inspired the development of pseudo-natural products using generative models. Public databases such as the Collection of Open NatUral ProdUcTs and the Universal Natural Product database (UNPD) are rich sources of structures to be used in generative models and other applications. In this work, we report the selection and characterization of the most diverse compounds of natural products from the UNPD using the MaxMin algorithm. The subsets generated with 14,994, 7,497, and 4,998 compounds are publicly available at https://github.com/DIFACQUIM/Natural-products-subsets-generation. We anticipate that the subsets will be particularly useful in building generative models based on natural products by research groups, particularly those with limited access to extensive supercomputer resources.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Artificial intelligence in the life sciences Pharmacology, Biochemistry, Genetics and Molecular Biology (General), Computer Science Applications, Health Informatics, Drug Discovery, Veterinary Science and Veterinary Medicine (General)

CiteScore

5.00

自引率

0.00%

发文量

审稿时长

15 days