{"title":"Discovering nuclear localization signal universe through a novel deep learning model with interpretable attention units","authors":"Yifan Li, Xiaoyong Pan, Hong-Bin Shen","doi":"10.1101/2024.08.10.606103","DOIUrl":null,"url":null,"abstract":"Nuclear localization signals (NLSs) are essential peptide fragments within proteins that play a decisive role in guiding proteins into the cell nucleus. Determining the existence and precise locations of NLSs experimentally is time-consuming and complicated, resulting in a scarcity of experimentally validated NLS fragments. Consequently, annotated NLS datasets are relatively small, presenting challenges for data-driven methods. In this study, we propose an innovative interpretable approach, NLSExplorer, which leverages large-scale protein language models to capture crucial biological information with a novel attention-based deep network for NLS identification. By utilizing the knowledge retrieved from protein language models, NLSExplorer achieves superior predictive performance compared to existing methods on two NLS benchmark datasets. Additionally, NLSExplorer is able to detect various kinds of segments highly correlated with nuclear transport, such as nuclear export signals. We employ NLSExplorer to investigate potential NLSs and other domains that are important for nuclear transport in nucleus-localized proteins within the Swiss-Prot database. Further comprehensive pattern analysis for all these segments uncovers a potential NLS space and internal relationship of important nuclear transport segments for 416 species. This study not only introduces a powerful tool for predicting and exploring NLS space, but also offers a versatile network that detects characteristic domains and motifs of NLSs.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"41 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.10.606103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Nuclear localization signals (NLSs) are essential peptide fragments within proteins that play a decisive role in guiding proteins into the cell nucleus. Determining the existence and precise locations of NLSs experimentally is time-consuming and complicated, resulting in a scarcity of experimentally validated NLS fragments. Consequently, annotated NLS datasets are relatively small, presenting challenges for data-driven methods. In this study, we propose an innovative interpretable approach, NLSExplorer, which leverages large-scale protein language models to capture crucial biological information with a novel attention-based deep network for NLS identification. By utilizing the knowledge retrieved from protein language models, NLSExplorer achieves superior predictive performance compared to existing methods on two NLS benchmark datasets. Additionally, NLSExplorer is able to detect various kinds of segments highly correlated with nuclear transport, such as nuclear export signals. We employ NLSExplorer to investigate potential NLSs and other domains that are important for nuclear transport in nucleus-localized proteins within the Swiss-Prot database. Further comprehensive pattern analysis for all these segments uncovers a potential NLS space and internal relationship of important nuclear transport segments for 416 species. This study not only introduces a powerful tool for predicting and exploring NLS space, but also offers a versatile network that detects characteristic domains and motifs of NLSs.