Mining drug-target interactions from biomedical literature using chemical and gene descriptions-based ensemble transformer model.

IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Bioinformatics advances Pub Date : 2024-07-22 eCollection Date: 2024-01-01 DOI:10.1093/bioadv/vbae106
Jehad Aldahdooh, Ziaurrehman Tanoli, Jing Tang
{"title":"Mining drug-target interactions from biomedical literature using chemical and gene descriptions-based ensemble transformer model.","authors":"Jehad Aldahdooh, Ziaurrehman Tanoli, Jing Tang","doi":"10.1093/bioadv/vbae106","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Drug-target interactions (DTIs) play a pivotal role in drug discovery, as it aims to identify potential drug targets and elucidate their mechanism of action. In recent years, the application of natural language processing (NLP), particularly when combined with pre-trained language models, has gained considerable momentum in the biomedical domain, with the potential to mine vast amounts of texts to facilitate the efficient extraction of DTIs from the literature.</p><p><strong>Results: </strong>In this article, we approach the task of DTIs as an entity-relationship extraction problem, utilizing different pre-trained transformer language models, such as BERT, to extract DTIs. Our results indicate that an ensemble approach, by combining gene descriptions from the Entrez Gene database with chemical descriptions from the Comparative Toxicogenomics Database (CTD), is critical for achieving optimal performance. The proposed model achieves an <i>F</i>1 score of 80.6 on the hidden DrugProt test set, which is the top-ranked performance among all the submitted models in the official evaluation. Furthermore, we conduct a comparative analysis to evaluate the effectiveness of various gene textual descriptions sourced from Entrez Gene and UniProt databases to gain insights into their impact on the performance. Our findings highlight the potential of NLP-based text mining using gene and chemical descriptions to improve drug-target extraction tasks.</p><p><strong>Availability and implementation: </strong>Datasets utilized in this study are accessible at https://dtis.drugtargetcommons.org/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11293871/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbae106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: Drug-target interactions (DTIs) play a pivotal role in drug discovery, as it aims to identify potential drug targets and elucidate their mechanism of action. In recent years, the application of natural language processing (NLP), particularly when combined with pre-trained language models, has gained considerable momentum in the biomedical domain, with the potential to mine vast amounts of texts to facilitate the efficient extraction of DTIs from the literature.

Results: In this article, we approach the task of DTIs as an entity-relationship extraction problem, utilizing different pre-trained transformer language models, such as BERT, to extract DTIs. Our results indicate that an ensemble approach, by combining gene descriptions from the Entrez Gene database with chemical descriptions from the Comparative Toxicogenomics Database (CTD), is critical for achieving optimal performance. The proposed model achieves an F1 score of 80.6 on the hidden DrugProt test set, which is the top-ranked performance among all the submitted models in the official evaluation. Furthermore, we conduct a comparative analysis to evaluate the effectiveness of various gene textual descriptions sourced from Entrez Gene and UniProt databases to gain insights into their impact on the performance. Our findings highlight the potential of NLP-based text mining using gene and chemical descriptions to improve drug-target extraction tasks.

Availability and implementation: Datasets utilized in this study are accessible at https://dtis.drugtargetcommons.org/.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用基于化学和基因描述的集合变换器模型从生物医学文献中挖掘药物与靶点的相互作用。
动机药物-靶点相互作用(DTIs)在药物发现中起着举足轻重的作用,因为它旨在确定潜在的药物靶点并阐明其作用机制。近年来,自然语言处理(NLP)的应用,尤其是与预先训练的语言模型相结合的应用,在生物医学领域获得了相当大的发展势头,有可能挖掘大量文本,促进从文献中有效提取 DTIs:在本文中,我们将 DTIs 任务视为实体关系提取问题,利用不同的预训练转换语言模型(如 BERT)来提取 DTIs。我们的研究结果表明,将 Entrez 基因数据库中的基因描述与比较毒物基因组学数据库(CTD)中的化学描述相结合的组合方法对于实现最佳性能至关重要。所提出的模型在隐藏的 DrugProt 测试集上取得了 80.6 的 F1 分数,在官方评估中所有提交的模型中名列前茅。此外,我们还进行了对比分析,以评估来自 Entrez Gene 和 UniProt 数据库的各种基因文本描述的有效性,从而深入了解它们对性能的影响。我们的研究结果凸显了利用基因和化学描述进行基于 NLP 的文本挖掘以改进药物靶标提取任务的潜力:本研究中使用的数据集可在 https://dtis.drugtargetcommons.org/ 上访问。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
1.60
自引率
0.00%
发文量
0
期刊最新文献
motifbreakR v2: expanded variant analysis including indels and integrated evidence from transcription factor binding databases. TransAnnot-a fast transcriptome annotation pipeline. PatchProt: hydrophobic patch prediction using protein foundation models. Accelerating protein-protein interaction screens with reduced AlphaFold-Multimer sampling. CAPTVRED: an automated pipeline for viral tracking and discovery from capture-based metagenomics samples.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1