Discovering nuclear localization signal universe through a novel deep learning model with interpretable attention units

bioRxiv - Bioinformatics Pub Date : 2024-08-10 DOI:10.1101/2024.08.10.606103

Yifan Li, Xiaoyong Pan, Hong-Bin Shen

{"title":"Discovering nuclear localization signal universe through a novel deep learning model with interpretable attention units","authors":"Yifan Li, Xiaoyong Pan, Hong-Bin Shen","doi":"10.1101/2024.08.10.606103","DOIUrl":null,"url":null,"abstract":"Nuclear localization signals (NLSs) are essential peptide fragments within proteins that play a decisive role in guiding proteins into the cell nucleus. Determining the existence and precise locations of NLSs experimentally is time-consuming and complicated, resulting in a scarcity of experimentally validated NLS fragments. Consequently, annotated NLS datasets are relatively small, presenting challenges for data-driven methods. In this study, we propose an innovative interpretable approach, NLSExplorer, which leverages large-scale protein language models to capture crucial biological information with a novel attention-based deep network for NLS identification. By utilizing the knowledge retrieved from protein language models, NLSExplorer achieves superior predictive performance compared to existing methods on two NLS benchmark datasets. Additionally, NLSExplorer is able to detect various kinds of segments highly correlated with nuclear transport, such as nuclear export signals. We employ NLSExplorer to investigate potential NLSs and other domains that are important for nuclear transport in nucleus-localized proteins within the Swiss-Prot database. Further comprehensive pattern analysis for all these segments uncovers a potential NLS space and internal relationship of important nuclear transport segments for 416 species. This study not only introduces a powerful tool for predicting and exploring NLS space, but also offers a versatile network that detects characteristic domains and motifs of NLSs.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"41 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.10.606103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Nuclear localization signals (NLSs) are essential peptide fragments within proteins that play a decisive role in guiding proteins into the cell nucleus. Determining the existence and precise locations of NLSs experimentally is time-consuming and complicated, resulting in a scarcity of experimentally validated NLS fragments. Consequently, annotated NLS datasets are relatively small, presenting challenges for data-driven methods. In this study, we propose an innovative interpretable approach, NLSExplorer, which leverages large-scale protein language models to capture crucial biological information with a novel attention-based deep network for NLS identification. By utilizing the knowledge retrieved from protein language models, NLSExplorer achieves superior predictive performance compared to existing methods on two NLS benchmark datasets. Additionally, NLSExplorer is able to detect various kinds of segments highly correlated with nuclear transport, such as nuclear export signals. We employ NLSExplorer to investigate potential NLSs and other domains that are important for nuclear transport in nucleus-localized proteins within the Swiss-Prot database. Further comprehensive pattern analysis for all these segments uncovers a potential NLS space and internal relationship of important nuclear transport segments for 416 species. This study not only introduces a powerful tool for predicting and exploring NLS space, but also offers a versatile network that detects characteristic domains and motifs of NLSs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过具有可解释注意力单元的新型深度学习模型发现核定位信号宇宙

核定位信号（NLS）是蛋白质中的重要肽段，在引导蛋白质进入细胞核方面起着决定性作用。通过实验确定 NLS 的存在和精确位置既耗时又复杂，因此实验验证的 NLS 片段非常稀少。因此，注释的 NLS 数据集相对较少，给数据驱动方法带来了挑战。在这项研究中，我们提出了一种创新的可解释方法--NLSExplorer，它利用大规模蛋白质语言模型，通过基于注意力的新型深度网络捕捉关键的生物信息，用于 NLS 识别。通过利用从蛋白质语言模型中获取的知识，NLSExplorer 在两个 NLS 基准数据集上取得了优于现有方法的预测性能。此外，NLSExplorer 还能检测与核运输高度相关的各种片段，如核输出信号。我们利用 NLSExplorer 研究了 Swiss-Prot 数据库中潜在的 NLS 和其他对核定位蛋白核运输非常重要的结构域。对所有这些片段的进一步综合模式分析揭示了潜在的 NLS 空间和 416 种重要核转运片段的内部关系。这项研究不仅为预测和探索 NLS 空间引入了一个强大的工具，而且还提供了一个检测 NLS 特征域和图案的多功能网络。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

bioRxiv - Bioinformatics

自引率

0.00%

发文量