Discovering nuclear localization signal universe through a novel deep learning model with interpretable attention units

Yifan Li, Xiaoyong Pan, Hong-Bin Shen
{"title":"Discovering nuclear localization signal universe through a novel deep learning model with interpretable attention units","authors":"Yifan Li, Xiaoyong Pan, Hong-Bin Shen","doi":"10.1101/2024.08.10.606103","DOIUrl":null,"url":null,"abstract":"Nuclear localization signals (NLSs) are essential peptide fragments within proteins that play a decisive role in guiding proteins into the cell nucleus. Determining the existence and precise locations of NLSs experimentally is time-consuming and complicated, resulting in a scarcity of experimentally validated NLS fragments. Consequently, annotated NLS datasets are relatively small, presenting challenges for data-driven methods. In this study, we propose an innovative interpretable approach, NLSExplorer, which leverages large-scale protein language models to capture crucial biological information with a novel attention-based deep network for NLS identification. By utilizing the knowledge retrieved from protein language models, NLSExplorer achieves superior predictive performance compared to existing methods on two NLS benchmark datasets. Additionally, NLSExplorer is able to detect various kinds of segments highly correlated with nuclear transport, such as nuclear export signals. We employ NLSExplorer to investigate potential NLSs and other domains that are important for nuclear transport in nucleus-localized proteins within the Swiss-Prot database. Further comprehensive pattern analysis for all these segments uncovers a potential NLS space and internal relationship of important nuclear transport segments for 416 species. This study not only introduces a powerful tool for predicting and exploring NLS space, but also offers a versatile network that detects characteristic domains and motifs of NLSs.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"41 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.10.606103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Nuclear localization signals (NLSs) are essential peptide fragments within proteins that play a decisive role in guiding proteins into the cell nucleus. Determining the existence and precise locations of NLSs experimentally is time-consuming and complicated, resulting in a scarcity of experimentally validated NLS fragments. Consequently, annotated NLS datasets are relatively small, presenting challenges for data-driven methods. In this study, we propose an innovative interpretable approach, NLSExplorer, which leverages large-scale protein language models to capture crucial biological information with a novel attention-based deep network for NLS identification. By utilizing the knowledge retrieved from protein language models, NLSExplorer achieves superior predictive performance compared to existing methods on two NLS benchmark datasets. Additionally, NLSExplorer is able to detect various kinds of segments highly correlated with nuclear transport, such as nuclear export signals. We employ NLSExplorer to investigate potential NLSs and other domains that are important for nuclear transport in nucleus-localized proteins within the Swiss-Prot database. Further comprehensive pattern analysis for all these segments uncovers a potential NLS space and internal relationship of important nuclear transport segments for 416 species. This study not only introduces a powerful tool for predicting and exploring NLS space, but also offers a versatile network that detects characteristic domains and motifs of NLSs.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过具有可解释注意力单元的新型深度学习模型发现核定位信号宇宙
核定位信号(NLS)是蛋白质中的重要肽段,在引导蛋白质进入细胞核方面起着决定性作用。通过实验确定 NLS 的存在和精确位置既耗时又复杂,因此实验验证的 NLS 片段非常稀少。因此,注释的 NLS 数据集相对较少,给数据驱动方法带来了挑战。在这项研究中,我们提出了一种创新的可解释方法--NLSExplorer,它利用大规模蛋白质语言模型,通过基于注意力的新型深度网络捕捉关键的生物信息,用于 NLS 识别。通过利用从蛋白质语言模型中获取的知识,NLSExplorer 在两个 NLS 基准数据集上取得了优于现有方法的预测性能。此外,NLSExplorer 还能检测与核运输高度相关的各种片段,如核输出信号。我们利用 NLSExplorer 研究了 Swiss-Prot 数据库中潜在的 NLS 和其他对核定位蛋白核运输非常重要的结构域。对所有这些片段的进一步综合模式分析揭示了潜在的 NLS 空间和 416 种重要核转运片段的内部关系。这项研究不仅为预测和探索 NLS 空间引入了一个强大的工具,而且还提供了一个检测 NLS 特征域和图案的多功能网络。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
ECSFinder: Optimized prediction of evolutionarily conserved RNA secondary structures from genome sequences GeneSpectra: a method for context-aware comparison of cell type gene expression across species A Bioinformatician, Computer Scientist, and Geneticist lead bioinformatic tool development - which one is better? Interpretable high-resolution dimension reduction of spatial transcriptomics data by DeepFuseNMF Pangenomics to understand prophage dynamics in the Pectobacterium genus and the radiating lineages of P. brasiliense
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1