化合物-目标对数据集：药物、临床候选药物和其他生物活性化合物之间的差异。

IF 5.8 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Scientific Data Pub Date : 2024-10-21 DOI:10.1038/s41597-024-03582-9

A Lina Heinzke, Barbara Zdrazil, Paul D Leeson, Robert J Young, Axel Pahl, Herbert Waldmann, Andrew R Leach

{"title":"化合物-目标对数据集：药物、临床候选药物和其他生物活性化合物之间的差异。","authors":"A Lina Heinzke, Barbara Zdrazil, Paul D Leeson, Robert J Young, Axel Pahl, Herbert Waldmann, Andrew R Leach","doi":"10.1038/s41597-024-03582-9","DOIUrl":null,"url":null,"abstract":"Providing a better understanding of what makes a compound a successful drug candidate is crucial for reducing the high attrition rates in drug discovery. Analyses of the differences between active compounds, clinical candidates and drugs require high-quality datasets. However, most datasets of drug discovery programs are not openly available. This work introduces a dataset of compound-target pairs extracted from the open-source bioactivity database ChEMBL (release 32). Compound-target pairs in the dataset either have at least one measured activity or are part of the manually curated set of known interactions in ChEMBL. Known interactions between drugs or clinical candidates and targets are specifically annotated to facilitate analyses of differences between drugs, clinical candidates, and other active compounds. In total, the dataset comprises 614,594 compound-target pairs, 5,109 (3,932) of which are known interactions between drugs (clinical candidates) and targets. The extraction is performed in an automated manner and fully reproducible. We are providing not only the datasets but also the code to rerun the analyses with other ChEMBL releases.","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1160"},"PeriodicalIF":5.8000,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11494047/pdf/","citationCount":"0","resultStr":"{\"title\":\"A compound-target pairs dataset: differences between drugs, clinical candidates and other bioactive compounds.\",\"authors\":\"A Lina Heinzke, Barbara Zdrazil, Paul D Leeson, Robert J Young, Axel Pahl, Herbert Waldmann, Andrew R Leach\",\"doi\":\"10.1038/s41597-024-03582-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Providing a better understanding of what makes a compound a successful drug candidate is crucial for reducing the high attrition rates in drug discovery. Analyses of the differences between active compounds, clinical candidates and drugs require high-quality datasets. However, most datasets of drug discovery programs are not openly available. This work introduces a dataset of compound-target pairs extracted from the open-source bioactivity database ChEMBL (release 32). Compound-target pairs in the dataset either have at least one measured activity or are part of the manually curated set of known interactions in ChEMBL. Known interactions between drugs or clinical candidates and targets are specifically annotated to facilitate analyses of differences between drugs, clinical candidates, and other active compounds. In total, the dataset comprises 614,594 compound-target pairs, 5,109 (3,932) of which are known interactions between drugs (clinical candidates) and targets. The extraction is performed in an automated manner and fully reproducible. We are providing not only the datasets but also the code to rerun the analyses with other ChEMBL releases.\",\"PeriodicalId\":21597,\"journal\":{\"name\":\"Scientific Data\",\"volume\":\"11 1\",\"pages\":\"1160\"},\"PeriodicalIF\":5.8000,\"publicationDate\":\"2024-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11494047/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scientific Data\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1038/s41597-024-03582-9\",\"RegionNum\":2,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Data","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41597-024-03582-9","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

要降低药物发现过程中的高损耗率，就必须更好地了解是什么使化合物成为成功的候选药物。分析活性化合物、临床候选药物和药物之间的差异需要高质量的数据集。然而，大多数药物发现项目的数据集并不公开。这项工作介绍了从开源生物活性数据库 ChEMBL（第 32 版）中提取的化合物-靶标对数据集。数据集中的化合物-靶标配对要么至少有一个测定的活性，要么是 ChEMBL 中人工编辑的已知相互作用集的一部分。药物或候选临床药物与靶点之间的已知相互作用都有专门的注释，以方便分析药物、候选临床药物和其他活性化合物之间的差异。该数据集总共包括 614,594 个化合物-靶标对，其中 5,109 个（3,932 个）是药物（临床候选药物）与靶标之间的已知相互作用。提取工作以自动化方式进行，完全可重复。我们不仅提供数据集，还提供代码，以便在其他 ChEMBL 版本中重新运行分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A compound-target pairs dataset: differences between drugs, clinical candidates and other bioactive compounds.

Providing a better understanding of what makes a compound a successful drug candidate is crucial for reducing the high attrition rates in drug discovery. Analyses of the differences between active compounds, clinical candidates and drugs require high-quality datasets. However, most datasets of drug discovery programs are not openly available. This work introduces a dataset of compound-target pairs extracted from the open-source bioactivity database ChEMBL (release 32). Compound-target pairs in the dataset either have at least one measured activity or are part of the manually curated set of known interactions in ChEMBL. Known interactions between drugs or clinical candidates and targets are specifically annotated to facilitate analyses of differences between drugs, clinical candidates, and other active compounds. In total, the dataset comprises 614,594 compound-target pairs, 5,109 (3,932) of which are known interactions between drugs (clinical candidates) and targets. The extraction is performed in an automated manner and fully reproducible. We are providing not only the datasets but also the code to rerun the analyses with other ChEMBL releases.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Scientific Data Social Sciences-Education

CiteScore

11.20

自引率

4.10%

发文量

689

审稿时长

16 weeks

期刊介绍： Scientific Data is an open-access journal focused on data, publishing descriptions of research datasets and articles on data sharing across natural sciences, medicine, engineering, and social sciences. Its goal is to enhance the sharing and reuse of scientific data, encourage broader data sharing, and acknowledge those who share their data. The journal primarily publishes Data Descriptors, which offer detailed descriptions of research datasets, including data collection methods and technical analyses validating data quality. These descriptors aim to facilitate data reuse rather than testing hypotheses or presenting new interpretations, methods, or in-depth analyses.