首页 > 最新文献

Genomics, Proteomics & Bioinformatics最新文献

英文 中文
TSNAdb v2.0: The Updated Version of Tumor-specific Neoantigen Database TSNAdb v2.0:肿瘤特异性新抗原数据库的更新版本。
IF 9.5 2区 生物学 Q1 GENETICS & HEREDITY Pub Date : 2023-04-01 DOI: 10.1016/j.gpb.2022.09.012
Jingcheng Wu , Wenfan Chen , Yuxuan Zhou , Ying Chi , Xiansheng Hua , Jian Wu , Xun Gu , Shuqing Chen , Zhan Zhou

In recent years, neoantigens have been recognized as ideal targets for tumor immunotherapy. With the development of neoantigen-based tumor immunotherapy, comprehensive neoantigen databases are urgently needed to meet the growing demand for clinical studies. We have built the tumor-specific neoantigen database (TSNAdb) previously, which has attracted much attention. In this study, we provide TSNAdb v2.0, an updated version of the TSNAdb. TSNAdb v2.0 offers several new features, including (1) adopting more stringent criteria for neoantigen identification, (2) providing predicted neoantigens derived from three types of somatic mutations, and (3) collecting experimentally validated neoantigens and dividing them according to the experimental level. TSNAdb v2.0 is freely available at https://pgx.zju.edu.cn/tsnadb/.

近年来,新抗原已被公认为肿瘤免疫治疗的理想靶点。随着基于新抗原的肿瘤免疫疗法的发展,迫切需要全面的新抗原数据库来满足日益增长的临床研究需求。我们之前已经建立了肿瘤特异性新抗原数据库(TSNAdb),这引起了人们的广泛关注。在本研究中,我们提供了TSNAdb v2.0,这是TSNAdb的更新版本。TSNAdb v2.0提供了几个新功能,包括(1)采用更严格的新抗原鉴定标准,(2)提供来自三种类型体细胞突变的预测新抗原,以及(3)收集实验验证的新抗原并根据实验水平进行划分。TSNAdb v2.0可在https://pgx.zju.edu.cn/tsnadb/.
{"title":"TSNAdb v2.0: The Updated Version of Tumor-specific Neoantigen Database","authors":"Jingcheng Wu ,&nbsp;Wenfan Chen ,&nbsp;Yuxuan Zhou ,&nbsp;Ying Chi ,&nbsp;Xiansheng Hua ,&nbsp;Jian Wu ,&nbsp;Xun Gu ,&nbsp;Shuqing Chen ,&nbsp;Zhan Zhou","doi":"10.1016/j.gpb.2022.09.012","DOIUrl":"10.1016/j.gpb.2022.09.012","url":null,"abstract":"<div><p>In recent years, <strong>neoantigens</strong> have been recognized as ideal targets for <strong>tumor immunotherapy</strong>. With the development of neoantigen-based tumor immunotherapy, comprehensive neoantigen <strong>databases</strong> are urgently needed to meet the growing demand for clinical studies. We have built the tumor-specific neoantigen database (TSNAdb) previously, which has attracted much attention. In this study, we provide TSNAdb v2.0, an updated version of the TSNAdb. TSNAdb v2.0 offers several new features, including (1) adopting more stringent criteria for neoantigen identification, (2) providing predicted neoantigens derived from three types of <strong>somatic mutations</strong>, and (3) collecting experimentally validated neoantigens and dividing them according to the experimental level. TSNAdb v2.0 is freely available at <span>https://pgx.zju.edu.cn/tsnadb/</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"21 2","pages":"Pages 259-266"},"PeriodicalIF":9.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9743390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
OOCDB: A Comprehensive, Systematic, and Real-time Organs-on-a-chip Database OOCDB:一个全面、系统、实时的片上组织数据库。
IF 9.5 2区 生物学 Q1 GENETICS & HEREDITY Pub Date : 2023-04-01 DOI: 10.1016/j.gpb.2023.01.001
Jian Li , Weicheng Liang , Zaozao Chen , Xingyu Li , Pan Gu , Anna Liu , Pin Chen , Qiwei Li , Xueyin Mei , Jing Yang , Jun Liu , Lincao Jiang , Zhongze Gu

Organs-on-a-chip is a microfluidic microphysiological system that uses microfluidic technology to analyze the structure and function of living human cells at the tissue and organ levels in vitro. Organs-on-a-chip technology, as opposed to traditional two-dimensional cell culture and animal models, can more closely simulate pathologic and toxicologic interactions between different organs or tissues and reflect the collaborative response of multiple organs to drugs. Despite the fact that many organs-on-a-chip-related data have been published, none of the current databases have all of the following functions: searching, downloading, as well as analyzing data and results from the literature on organs-on-a-chip. Therefore, we created an organs-on-a-chip database (OOCDB) as a platform to integrate information about organs-on-a-chip from various sources, including literature, patents, raw data from microarray and transcriptome sequencing, several open-access datasets of organs-on-a-chip and organoids, and data generated in our laboratory. OOCDB contains dozens of sub-databases and analysis tools, and each sub-database contains various data associated with organs-on-a-chip, with the goal of providing researchers with a comprehensive, systematic, and convenient search engine. Furthermore, it offers a variety of other functions, such as mathematical modeling, three-dimensional modeling, and citation mapping, to meet the needs of researchers and promote the development of organs-on-a-chip. The OOCDB is available at http://www.organchip.cn.

芯片组织是一种微流体微物理系统,使用微流体技术在体外组织和器官水平上分析活的人类细胞的结构和功能。与传统的二维细胞培养和动物模型相比,芯片组织器技术可以更紧密地模拟不同器官或组织之间的病理和毒理学相互作用,并反映多个器官对药物的协同反应。尽管已经发布了许多与芯片组织相关的数据,但目前的数据库都不具备以下全部功能:搜索、下载以及分析芯片组织文献中的数据和结果。因此,我们创建了一个芯片组织数据库(OOCDB),作为一个平台,整合来自各种来源的芯片组织信息,包括文献、专利、微阵列和转录组测序的原始数据、芯片组织和类器官的几个开放获取数据集,以及我们实验室生成的数据。OOCDB包含数十个子数据库和分析工具,每个子数据库都包含与芯片上组织相关的各种数据,目的是为研究人员提供一个全面、系统、方便的搜索引擎。此外,它还提供了各种其他功能,如数学建模、三维建模和引文映射,以满足研究人员的需求,促进芯片上组织的发展。OOCDB可在http://www.organchip.cn.
{"title":"OOCDB: A Comprehensive, Systematic, and Real-time Organs-on-a-chip Database","authors":"Jian Li ,&nbsp;Weicheng Liang ,&nbsp;Zaozao Chen ,&nbsp;Xingyu Li ,&nbsp;Pan Gu ,&nbsp;Anna Liu ,&nbsp;Pin Chen ,&nbsp;Qiwei Li ,&nbsp;Xueyin Mei ,&nbsp;Jing Yang ,&nbsp;Jun Liu ,&nbsp;Lincao Jiang ,&nbsp;Zhongze Gu","doi":"10.1016/j.gpb.2023.01.001","DOIUrl":"10.1016/j.gpb.2023.01.001","url":null,"abstract":"<div><p><strong>Organs-on-a-chip</strong> is a microfluidic microphysiological system that uses microfluidic technology to analyze the structure and function of living human cells at the tissue and <strong>organ</strong> levels <em>in vitro</em>. Organs-on-a-chip technology, as opposed to traditional two-dimensional cell culture and animal models, can more closely simulate pathologic and toxicologic interactions between different organs or tissues and reflect the collaborative response of multiple organs to drugs. Despite the fact that many organs-on-a-chip-related data have been published, none of the current <strong>databases</strong> have all of the following functions: searching, downloading, as well as analyzing data and results from the literature on organs-on-a-chip. Therefore, we created an organs-on-a-chip database (OOCDB) as a platform to integrate information about organs-on-a-chip from various sources, including literature, patents, raw data from microarray and transcriptome sequencing, several open-access datasets of organs-on-a-chip and organoids, and data generated in our laboratory. OOCDB contains dozens of sub-databases and analysis tools, and each sub-database contains various data associated with organs-on-a-chip, with the goal of providing researchers with a comprehensive, systematic, and convenient search engine. Furthermore, it offers a variety of other functions, such as <strong>mathematical modeling</strong>, three-dimensional modeling, and <strong>citation mapping</strong>, to meet the needs of researchers and promote the development of organs-on-a-chip. The OOCDB is available at <span>http://www.organchip.cn</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"21 2","pages":"Pages 243-258"},"PeriodicalIF":9.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9752429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PlantCADB: A Comprehensive Plant Chromatin Accessibility Database PlantCADB:一个全面的植物染色质可访问性数据库。
IF 9.5 2区 生物学 Q1 GENETICS & HEREDITY Pub Date : 2023-04-01 DOI: 10.1016/j.gpb.2022.10.005
Ke Ding , Shanwen Sun , Yang Luo , Chaoyue Long , Jingwen Zhai , Yixiao Zhai , Guohua Wang

Chromatin accessibility landscapes are essential for detecting regulatory elements, illustrating the corresponding regulatory networks, and, ultimately, understanding the molecular basis underlying key biological processes. With the advancement of sequencing technologies, a large volume of chromatin accessibility data has been accumulated and integrated for humans and other mammals. These data have greatly advanced the study of disease pathogenesis, cancer survival prognosis, and tissue development. To advance the understanding of molecular mechanisms regulating plant key traits and biological processes, we developed a comprehensive plant chromatin accessibility database (PlantCADB) from 649 samples of 37 species. These samples are abiotic stress-related (such as heat, cold, drought, and salt; 159 samples), development-related (232 samples), and/or tissue-specific (376 samples). Overall, 18,339,426 accessible chromatin regions (ACRs) were compiled. These ACRs were annotated with genomic information, associated genes, transcription factor footprint, motif, and single-nucleotide polymorphisms (SNPs). Additionally, PlantCADB provides various tools to visualize ACRs and corresponding annotations. It thus forms an integrated, annotated, and analyzed plant-related chromatin accessibility resource, which can aid in better understanding genetic regulatory networks underlying development, important traits, stress adaptations, and evolution. PlantCADB is freely available at https://bioinfor.nefu.edu.cn/PlantCADB/.

染色质可及性景观对于检测调控元件、阐明相应的调控网络以及最终理解关键生物过程的分子基础至关重要。随着测序技术的进步,人类和其他哺乳动物积累并整合了大量染色质可及性数据。这些数据极大地推动了疾病发病机制、癌症生存预后和组织发育的研究。为了进一步了解调节植物关键性状和生物学过程的分子机制,我们从37个物种的649个样本中开发了一个全面的植物染色质可及性数据库(PlantCADB)。这些样本是与非生物胁迫相关的(如热、冷、干旱和盐;159个样本)、与发育相关的(232个样本)和/或组织特异性的(376个样本)。总的来说,汇编了18339426个可访问染色质区域(ACRs)。这些ACR用基因组信息、相关基因、转录因子足迹、基序和单核苷酸多态性(SNPs)进行注释。此外,PlantCADB提供了各种工具来可视化ACR和相应的注释。因此,它形成了一个整合、注释和分析的植物相关染色质可及性资源,有助于更好地理解发育、重要性状、应激适应和进化背后的遗传调控网络。PlantCADB可在https://bioinfor.nefu.edu.cn/PlantCADB/.
{"title":"PlantCADB: A Comprehensive Plant Chromatin Accessibility Database","authors":"Ke Ding ,&nbsp;Shanwen Sun ,&nbsp;Yang Luo ,&nbsp;Chaoyue Long ,&nbsp;Jingwen Zhai ,&nbsp;Yixiao Zhai ,&nbsp;Guohua Wang","doi":"10.1016/j.gpb.2022.10.005","DOIUrl":"10.1016/j.gpb.2022.10.005","url":null,"abstract":"<div><p><strong>Chromatin accessibility</strong> landscapes are essential for detecting regulatory elements, illustrating the corresponding <strong>regulatory networks</strong>, and, ultimately, understanding the molecular basis underlying key biological processes. With the advancement of sequencing technologies, a large volume of chromatin accessibility data has been accumulated and integrated for humans and other mammals. These data have greatly advanced the study of disease pathogenesis, cancer survival prognosis, and tissue development. To advance the understanding of molecular mechanisms regulating <strong>plant</strong> key traits and biological processes, we developed a comprehensive plant chromatin accessibility database (PlantCADB) from 649 samples of 37 species. These samples are abiotic stress-related (such as heat, cold, drought, and salt; 159 samples), development-related (232 samples), and/or tissue-specific (376 samples). Overall, 18,339,426 accessible chromatin regions (ACRs) were compiled. These ACRs were annotated with genomic information, associated genes, <strong>transcription factor footprint</strong>, motif, and single-nucleotide polymorphisms (SNPs). Additionally, PlantCADB provides various tools to visualize ACRs and corresponding annotations. It thus forms an integrated, annotated, and analyzed plant-related chromatin accessibility resource, which can aid in better understanding genetic regulatory networks underlying development, important traits, stress adaptations, and evolution.<!--> <!-->PlantCADB is freely available at <span>https://bioinfor.nefu.edu.cn/PlantCADB/</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"21 2","pages":"Pages 311-323"},"PeriodicalIF":9.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9767481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
TransDFL: Identification of Disordered Flexible Linkers in Proteins by Transfer Learning TransDFL:通过迁移学习识别蛋白质中的无序柔性连接子。
IF 9.5 2区 生物学 Q1 GENETICS & HEREDITY Pub Date : 2023-04-01 DOI: 10.1016/j.gpb.2022.10.004
Yihe Pang , Bin Liu

Disordered flexible linkers (DFLs) are the functional disordered regions in proteins, which are the sub-regions of intrinsically disordered regions (IDRs) and play important roles in connecting domains and maintaining inter-domain interactions. Trained with the limited available DFLs, the existing DFL predictors based on the machine learning techniques tend to predict the ordered residues as DFLs, leading to a high false positive rate (FPR) and low prediction accuracy. Previous studies have shown that DFLs are extremely flexible disordered regions, which are usually predicted as disordered residues with high confidence [P(D) > 0.9] by an IDR predictor. Therefore, transferring an IDR predictor to an accurate DFL predictor is of great significance for understanding the functions of IDRs. In this study, we proposed a new predictor called TransDFL for identifying DFLs by transferring the RFPR-IDP predictor for IDR identification to the DFL prediction. The RFPR-IDP was pre-trained with IDR sequences to learn the general features between IDRs and DFLs, which is helpful to reduce the false positives in the ordered regions. RFPR-IDP was fine-tuned with the DFL sequences to capture the specific features of DFLs so as to be transferred into the TransDFL. Experimental results of two application scenarios (prediction of DFLs only in IDRs or prediction of DFLs in entire proteins) showed that TransDFL consistently outperformed other existing DFL predictors with higher accuracy. The corresponding web server of TransDFL can be freely accessed at http://bliulab.net/TransDFL/.

无序柔性连接子(DFL)是蛋白质中的功能性无序区域,是固有无序区域(IDRs)的亚区域,在连接结构域和维持结构域间相互作用方面发挥着重要作用。现有的基于机器学习技术的DFL预测器使用有限的可用DFL进行训练,倾向于将有序残差预测为DFL,导致高假阳性率(FPR)和低预测精度。先前的研究表明,DFL是非常灵活的无序区域,IDR预测器通常将其预测为具有高置信度[P(D)>0.9]的无序残基。因此,将IDR预测器转换为准确的DFL预测器对于理解IDR的功能具有重要意义。在这项研究中,我们提出了一种称为TransDFL的新预测器,通过将用于IDR识别的RFPR-IDP预测器转移到DFL预测来识别DFL。用IDR序列对RFPR-IDP进行预训练,以学习IDR和DFL之间的一般特征,这有助于减少有序区域中的假阳性。RFPR-IDP用DFL序列进行微调,以捕获DFL的特定特征,从而转移到TransDFL中。两种应用场景(仅在IDRs中预测DFL或在整个蛋白质中预测DFLs)的实验结果表明,TransDFL始终以更高的准确性优于其他现有的DFL预测因子。TransDFL的相应web服务器可以在http://bliulab.net/TransDFL/.
{"title":"TransDFL: Identification of Disordered Flexible Linkers in Proteins by Transfer Learning","authors":"Yihe Pang ,&nbsp;Bin Liu","doi":"10.1016/j.gpb.2022.10.004","DOIUrl":"10.1016/j.gpb.2022.10.004","url":null,"abstract":"<div><p><strong>Disordered flexible linkers</strong> (DFLs) are the functional disordered regions in proteins, which are the sub-regions of intrinsically disordered regions (IDRs) and play important roles in connecting domains and maintaining inter-domain interactions. Trained with the limited available DFLs, the existing DFL predictors based on the machine learning techniques tend to predict the ordered residues as DFLs, leading to a high <strong>false</strong> <strong>positive rate</strong> (FPR) and low prediction accuracy. Previous studies have shown that DFLs are extremely flexible disordered regions, which are usually predicted as disordered residues with high confidence [<em>P</em>(<em>D</em>) &gt; 0.9] by an IDR predictor. Therefore, transferring an IDR predictor to an accurate DFL predictor is of great significance for understanding the functions of IDRs. In this study, we proposed a new predictor called TransDFL for identifying DFLs by transferring the RFPR-IDP predictor for IDR identification to the DFL prediction. The RFPR-IDP was pre-trained with IDR sequences to learn the general features between IDRs and DFLs, which is helpful to reduce the false positives in the ordered regions. RFPR-IDP was fine-tuned with the DFL sequences to capture the specific features of DFLs so as to be transferred into the TransDFL. Experimental results of two application scenarios (prediction of DFLs only in IDRs or prediction of DFLs in entire proteins) showed that TransDFL consistently outperformed other existing DFL predictors with higher accuracy. The corresponding web server of TransDFL can be freely accessed at <span>http://bliulab.net/TransDFL/</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"21 2","pages":"Pages 359-369"},"PeriodicalIF":9.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10354923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
mvPPT: A Highly Efficient and Sensitive Pathogenicity Prediction Tool for Missense Variants mvPPT:一种用于错义变体的高效、灵敏的致病性预测工具。
IF 9.5 2区 生物学 Q1 GENETICS & HEREDITY Pub Date : 2023-04-01 DOI: 10.1016/j.gpb.2022.07.005
Shi-Yuan Tong , Ke Fan , Zai-Wei Zhou , Lin-Yun Liu , Shu-Qing Zhang , Yinghui Fu , Guang-Zhong Wang , Ying Zhu , Yong-Chun Yu

Next-generation sequencing technologies both boost the discovery of variants in the human genome and exacerbate the challenges of pathogenic variant identification. In this study, we developed Pathogenicity Prediction Tool for missense variants (mvPPT), a highly sensitive and accurate missense variant classifier based on gradient boosting. mvPPT adopts high-confidence training sets with a wide spectrum of variant profiles, and extracts three categories of features, including scores from existing prediction tools, frequencies (allele frequencies, amino acid frequencies, and genotype frequencies), and genomic context. Compared with established predictors, mvPPT achieves superior performance in all test sets, regardless of data source. In addition, our study also provides guidance for training set and feature selection strategies, as well as reveals highly relevant features, which may further provide biological insights into variant pathogenicity. mvPPT is freely available at http://www.mvppt.club/.

下一代测序技术既促进了人类基因组变异的发现,也加剧了致病变异鉴定的挑战。在这项研究中,我们开发了错义变体致病性预测工具(mvPPT),这是一种基于梯度增强的高度敏感和准确的错义变体分类器。mvPPT采用了具有广泛变异谱的高置信度训练集,并提取了三类特征,包括现有预测工具的得分、频率(等位基因频率、氨基酸频率和基因型频率)和基因组背景。与已建立的预测因子相比,无论数据来源如何,mvPPT在所有测试集中都取得了卓越的性能。此外,我们的研究还为训练集和特征选择策略提供了指导,并揭示了高度相关的特征,这可能进一步为变异致病性提供生物学见解。mvPPT可在http://www.mvppt.club/.
{"title":"mvPPT: A Highly Efficient and Sensitive Pathogenicity Prediction Tool for Missense Variants","authors":"Shi-Yuan Tong ,&nbsp;Ke Fan ,&nbsp;Zai-Wei Zhou ,&nbsp;Lin-Yun Liu ,&nbsp;Shu-Qing Zhang ,&nbsp;Yinghui Fu ,&nbsp;Guang-Zhong Wang ,&nbsp;Ying Zhu ,&nbsp;Yong-Chun Yu","doi":"10.1016/j.gpb.2022.07.005","DOIUrl":"10.1016/j.gpb.2022.07.005","url":null,"abstract":"<div><p>Next-generation sequencing technologies both boost the discovery of variants in the human genome and exacerbate the challenges of pathogenic variant identification. In this study, we developed <strong>Pathogenicity Prediction</strong> Tool for <strong>missense variants</strong> (mvPPT), a highly sensitive and accurate missense variant classifier based on gradient boosting. mvPPT adopts high-confidence training sets with a wide spectrum of variant profiles, and extracts three categories of features, including scores from existing prediction tools, frequencies (allele frequencies, amino acid frequencies, and genotype frequencies), and genomic context. Compared with established predictors, mvPPT achieves superior performance in all test sets, regardless of data source. In addition, our study also provides guidance for training set and feature selection strategies, as well as reveals highly relevant features, which may further provide biological insights into variant pathogenicity. mvPPT is freely available at <span>http://www.mvppt.club/</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"21 2","pages":"Pages 414-426"},"PeriodicalIF":9.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10043480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RNA2Immune: A Database of Experimentally Supported Data Linking Non-coding RNA Regulation to The Immune System RNA2Immune:一个实验支持的数据数据库,将非编码RNA调节与免疫系统联系起来。
IF 9.5 2区 生物学 Q1 GENETICS & HEREDITY Pub Date : 2023-04-01 DOI: 10.1016/j.gpb.2022.05.001
Jianjian Wang , Shuang Li , Tianfeng Wang , Si Xu , Xu Wang , Xiaotong Kong , Xiaoyu Lu , Huixue Zhang , Lifang Li , Meng Feng , Shangwei Ning , Lihua Wang

Non-coding RNAs (ncRNAs), such as microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and circular RNAs (circRNAs), have emerged as important regulators of the immune system and are involved in the control of immune cell biology, disease pathogenesis, as well as vaccine responses. A repository of ncRNA–immune associations will facilitate our understanding of ncRNA-dependent mechanisms in the immune system and advance the development of therapeutics and prevention for immune disorders. Here, we describe a comprehensive database, RNA2Immune, which aims to provide a high-quality resource of experimentally supported database linking ncRNA regulatory mechanisms to immune cell function, immune disease, cancer immunology, and vaccines. The current version of RNA2Immune documents 50,433 immune–ncRNA associations in 42 host species, including (1) 6690 ncRNA associations with immune functions involving 31 immune cell types; (2) 38,672 ncRNA associations with 348 immune diseases; (3) 4833 ncRNA associations with cancer immunology; and (4) 238 ncRNA associations with vaccine responses involving 26 vaccine types targeting 22 diseases. RNA2Immune provides a user-friendly interface for browsing, searching, and downloading ncRNA–immune system associations. Collectively, RNA2Immune provides important information about how ncRNAs influence immune cell function, how dysregulation of these ncRNAs leads to pathological consequences (immune diseases and cancers), and how ncRNAs affect immune responses to vaccines. RNA2Immune is available at http://bio-bigdata.hrbmu.edu.cn/rna2immune/home.jsp.

非编码RNA(ncRNA),如微小RNA(miRNA)、长非编码RNA和环状RNA(circRNA),已成为免疫系统的重要调节因子,并参与免疫细胞生物学、疾病发病机制和疫苗反应的控制。ncRNA免疫关联库将有助于我们理解免疫系统中的ncRNA依赖性机制,并促进免疫疾病的治疗和预防的发展。在此,我们描述了一个综合数据库RNA2Immune,旨在提供一个高质量的实验支持数据库资源,将ncRNA调节机制与免疫细胞功能、免疫疾病、癌症免疫学和疫苗联系起来。RNA2Immune的当前版本记录了42个宿主物种中的50433个免疫ncRNA关联,包括(1)6690个具有免疫功能的ncRNA关联涉及31种免疫细胞类型;(2) 38672个ncRNA与348种免疫性疾病相关;(3) 4833个ncRNA与癌症免疫学的相关性;和(4)238个ncRNA与疫苗反应的关联,涉及针对22种疾病的26种疫苗类型。RNA2Immune提供了一个用户友好的界面,用于浏览、搜索和下载ncRNA免疫系统关联。总的来说,RNA2Immune提供了关于ncRNA如何影响免疫细胞功能、这些ncRNA的失调如何导致病理后果(免疫疾病和癌症)以及ncRNAs如何影响对疫苗的免疫反应的重要信息。RNA2Immune可在http://bio-bigdata.hrbmu.edu.cn/rna2immune/home.jsp.
{"title":"RNA2Immune: A Database of Experimentally Supported Data Linking Non-coding RNA Regulation to The Immune System","authors":"Jianjian Wang ,&nbsp;Shuang Li ,&nbsp;Tianfeng Wang ,&nbsp;Si Xu ,&nbsp;Xu Wang ,&nbsp;Xiaotong Kong ,&nbsp;Xiaoyu Lu ,&nbsp;Huixue Zhang ,&nbsp;Lifang Li ,&nbsp;Meng Feng ,&nbsp;Shangwei Ning ,&nbsp;Lihua Wang","doi":"10.1016/j.gpb.2022.05.001","DOIUrl":"10.1016/j.gpb.2022.05.001","url":null,"abstract":"<div><p>Non-coding RNAs (<strong>ncRNAs</strong>), such as microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and circular RNAs (circRNAs), have emerged as important regulators of the immune system and are involved in the control of immune cell biology, disease pathogenesis, as well as <strong>vaccine</strong> responses. A repository of ncRNA–immune associations will facilitate our understanding of ncRNA-dependent mechanisms in the immune system and advance the development of therapeutics and prevention for immune disorders. Here, we describe a comprehensive database, RNA2Immune, which aims to provide a high-quality resource of experimentally supported database linking ncRNA regulatory mechanisms to immune cell function, <strong>immune disease</strong>, <strong>cancer immunology</strong>, and vaccines. The current version of RNA2Immune documents 50,433 immune–ncRNA associations in 42 host species, including (1) 6690 ncRNA associations with immune functions involving 31 immune cell types; (2) 38,672 ncRNA associations with 348 immune diseases; (3) 4833 ncRNA associations with cancer immunology; and (4) 238 ncRNA associations with vaccine responses involving 26 vaccine types targeting 22 diseases. RNA2Immune provides a user-friendly interface for browsing, searching, and downloading ncRNA–immune system associations. Collectively, RNA2Immune provides important information about how ncRNAs influence immune cell function, how dysregulation of these ncRNAs leads to pathological consequences (immune diseases and cancers), and how ncRNAs affect immune responses to vaccines. RNA2Immune is available at <span>http://bio-bigdata.hrbmu.edu.cn/rna2immune/home.jsp</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"21 2","pages":"Pages 283-291"},"PeriodicalIF":9.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10143159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
CTRR-ncRNA: A Knowledgebase for Cancer Therapy Resistance and Recurrence Associated Non-coding RNAs CTRR-ncRNA:癌症治疗耐药性和复发相关非编码RNA的知识库。
IF 9.5 2区 生物学 Q1 GENETICS & HEREDITY Pub Date : 2023-04-01 DOI: 10.1016/j.gpb.2022.10.003
Tong Tang , Xingyun Liu , Rongrong Wu , Li Shen , Shumin Ren , Bairong Shen

Cancer therapy resistance and recurrence (CTRR) are the dominant causes of death in cancer patients. Recent studies have indicated that non-coding RNAs (ncRNAs) can not only reverse the resistance to cancer therapy but also are crucial biomarkers for the evaluation and prediction of CTRR. Herein, we developed CTRR-ncRNA, a knowledgebase of CTRR-associated ncRNAs, aiming to provide an accurate and comprehensive resource for research involving the association between CTRR and ncRNAs. Compared to most of the existing cancer databases, CTRR-ncRNA is focused on the clinical characterization of cancers, including cancer subtypes, as well as survival outcomes and responses to personalized therapy of cancer patients. Information pertaining to biomarker ncRNAs has also been documented for the development of personalized CTRR prediction. A user-friendly interface and several functional modules have been incorporated into the database. Based on the preliminary analysis of genotype–phenotype relationships, universal ncRNAs have been found to be potential biomarkers for CTRR. The CTRR-ncRNA is a translation-oriented knowledgebase and it provides a valuable resource for mechanistic investigations and explainable artificial intelligence-based modeling. CTRR-ncRNA is freely available to the public at http://ctrr.bioinf.org.cn/.

癌症治疗耐药性和复发(CTRR)是癌症患者死亡的主要原因。最近的研究表明,非编码RNA(ncRNA)不仅可以逆转对癌症治疗的耐药性,而且是评估和预测CTRR的关键生物标志物。在此,我们开发了CTRR-ncRNA,这是一个CTRR相关ncRNA的知识库,旨在为CTRR与ncRNA之间的关联研究提供准确而全面的资源。与大多数现有的癌症数据库相比,CTRR-ncRNA专注于癌症的临床特征,包括癌症亚型,以及癌症患者的生存结果和个性化治疗反应。与生物标志物ncRNA相关的信息也已被记录用于开发个性化CTRR预测。一个用户友好的界面和几个功能模块已被纳入数据库。基于基因型-表型关系的初步分析,普遍的ncRNA已被发现是CTRR的潜在生物标志物。CTRR-ncRNA是一个面向翻译的知识库,它为机制研究和可解释的基于人工智能的建模提供了宝贵的资源。CTRR-ncRNA可在http://ctrr.bioinf.org.cn/.
{"title":"CTRR-ncRNA: A Knowledgebase for Cancer Therapy Resistance and Recurrence Associated Non-coding RNAs","authors":"Tong Tang ,&nbsp;Xingyun Liu ,&nbsp;Rongrong Wu ,&nbsp;Li Shen ,&nbsp;Shumin Ren ,&nbsp;Bairong Shen","doi":"10.1016/j.gpb.2022.10.003","DOIUrl":"10.1016/j.gpb.2022.10.003","url":null,"abstract":"<div><p>Cancer therapy resistance and recurrence (CTRR) are the dominant causes of death in cancer patients. Recent studies have indicated that <strong>non-coding RNAs</strong> (ncRNAs) can not only reverse the resistance to cancer therapy but also are crucial biomarkers for the evaluation and prediction of CTRR. Herein, we developed CTRR-ncRNA, a <strong>knowledgebase</strong> of CTRR-associated ncRNAs, aiming to provide an accurate and comprehensive resource for research involving the association between CTRR and ncRNAs. Compared to most of the existing cancer databases, CTRR-ncRNA is focused on the clinical characterization of cancers, including cancer subtypes, as well as survival outcomes and responses to personalized therapy of cancer patients. Information pertaining to biomarker ncRNAs has also been documented for the development of personalized CTRR prediction. A user-friendly interface and several functional modules have been incorporated into the database. Based on the preliminary analysis of genotype–phenotype relationships, universal ncRNAs have been found to be potential biomarkers for CTRR. The CTRR-ncRNA is a translation-oriented knowledgebase and it provides a valuable resource for mechanistic investigations and explainable artificial intelligence-based modeling. CTRR-ncRNA is freely available to the public at <span>http://ctrr.bioinf.org.cn/</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"21 2","pages":"Pages 292-299"},"PeriodicalIF":9.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9776047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
inMTSCCA: An Integrated Multi-task Sparse Canonical Correlation Analysis for Multi-omic Brain Imaging Genetics inMTSCCA:多组脑成像遗传学的综合多任务稀疏典型相关分析。
IF 9.5 2区 生物学 Q1 GENETICS & HEREDITY Pub Date : 2023-04-01 DOI: 10.1016/j.gpb.2023.03.005
Lei Du, Jin Zhang, Ying Zhao, Muheng Shang, Lei Guo, Junwei Han, The Alzheimer's Disease Neuroimaging Initiative

Identifying genetic risk factors for Alzheimer’s disease (AD) is an important research topic. To date, different endophenotypes, such as imaging-derived endophenotypes and proteomic expression-derived endophenotypes, have shown the great value in uncovering risk genes compared to case–control studies. Biologically, a co-varying pattern of different omics-derived endophenotypes could result from the shared genetic basis. However, existing methods mainly focus on the effect of endophenotypes alone; the effect of cross-endophenotype (CEP) associations remains largely unexploited. In this study, we used both endophenotypes and their CEP associations of multi-omic data to identify genetic risk factors, and proposed two integrated multi-task sparse canonical correlation analysis (inMTSCCA) methods, i.e., pairwise endophenotype correlation-guided MTSCCA (pcMTSCCA) and high-order endophenotype correlation-guided MTSCCA (hocMTSCCA). pcMTSCCA employed pairwise correlations between magnetic resonance imaging (MRI)-derived, plasma-derived, and cerebrospinal fluid (CSF)-derived endophenotypes as an additional penalty. hocMTSCCA used high-order correlations among these multi-omic data for regularization. To figure out genetic risk factors at individual and group levels, as well as altered endophenotypic markers, we introduced sparsity-inducing penalties for both models. We compared pcMTSCCA and hocMTSCCA with three related methods on both simulation and real (consisting of neuroimaging data, proteomic analytes, and genetic data) datasets. The results showed that our methods obtained better or comparable canonical correlation coefficients (CCCs) and better feature subsets than benchmarks. Most importantly, the identified genetic loci and heterogeneous endophenotypic markers showed high relevance. Therefore, jointly using multi-omic endophenotypes and their CEP associations is promising to reveal genetic risk factors. The source code and manual of inMTSCCA are available at https://ngdc.cncb.ac.cn/biocode/tools/BT007330.

识别阿尔茨海默病(AD)的遗传危险因素是一个重要的研究课题。到目前为止,与病例对照研究相比,不同的内表型,如成像衍生的内表型和蛋白质组表达衍生的内血型,在揭示风险基因方面显示出巨大的价值。在生物学上,不同组学衍生的内表型的共同变化模式可能是由共同的遗传基础造成的。然而,现有的方法主要集中于内表型单独的影响;交叉内表型(CEP)关联的作用在很大程度上仍未被利用。在这项研究中,我们使用多组数据的内表型及其CEP关联来识别遗传风险因素,并提出了两种集成的多任务稀疏典型相关分析(inMTSCCA)方法,即成对内表型相关引导的MTSCCA(pcMTSCCA。pcMTSCCA采用磁共振成像(MRI)衍生的、血浆衍生的和脑脊液(CSF)衍生的内表型之间的成对相关性作为额外的惩罚。hocMTSCCA使用这些多组数据之间的高阶相关性进行正则化。为了找出个体和群体水平的遗传风险因素,以及改变的内表型标记,我们对两个模型都引入了稀疏性诱导惩罚。我们在模拟和真实数据集(包括神经成像数据、蛋白质组分析和遗传数据)上比较了pcMTSCCA和hocMTSCCA与三种相关方法。结果表明,与基准测试相比,我们的方法获得了更好或可比的正则相关系数和更好的特征子集。最重要的是,已鉴定的遗传位点和异质性内表型标记显示出高度相关性。因此,联合使用多组体内表型及其CEP关联有望揭示遗传风险因素。inMTSCCA的源代码和手册可在https://ngdc.cncb.ac.cn/biocode/tools/BT007330.
{"title":"inMTSCCA: An Integrated Multi-task Sparse Canonical Correlation Analysis for Multi-omic Brain Imaging Genetics","authors":"Lei Du,&nbsp;Jin Zhang,&nbsp;Ying Zhao,&nbsp;Muheng Shang,&nbsp;Lei Guo,&nbsp;Junwei Han,&nbsp;The Alzheimer's Disease Neuroimaging Initiative","doi":"10.1016/j.gpb.2023.03.005","DOIUrl":"10.1016/j.gpb.2023.03.005","url":null,"abstract":"<div><p>Identifying <strong>genetic risk factors</strong> for Alzheimer’s disease (AD) is an important research topic. To date, different endophenotypes, such as imaging-derived endophenotypes and proteomic expression-derived endophenotypes, have shown the great value in uncovering risk genes compared to case–control studies. Biologically, a co-varying pattern of different omics-derived endophenotypes could result from the shared genetic basis. However, existing methods mainly focus on the effect of endophenotypes alone; the effect of <strong>cross-endophenotype</strong> (CEP) associations remains largely unexploited. In this study, we used both endophenotypes and their CEP associations of multi-omic data to identify genetic risk factors, and proposed two integrated multi-task sparse canonical correlation analysis (inMTSCCA) methods, <em>i.e.</em>, pairwise endophenotype correlation-guided MTSCCA (<em>pc</em>MTSCCA) and high-order endophenotype correlation-guided MTSCCA (<em>hoc</em>MTSCCA). <em>pc</em>MTSCCA employed pairwise correlations between magnetic resonance imaging (MRI)-derived, plasma-derived, and cerebrospinal fluid (CSF)-derived endophenotypes as an additional penalty. <em>hoc</em>MTSCCA used high-order correlations among these multi-omic data for regularization. To figure out genetic risk factors at individual and group levels, as well as altered endophenotypic markers, we introduced sparsity-inducing penalties for both models. We compared <em>pc</em>MTSCCA and <em>hoc</em>MTSCCA with three related methods on both simulation and real (consisting of neuroimaging data, proteomic analytes, and genetic data) datasets. The results showed that our methods obtained better or comparable canonical correlation coefficients (CCCs) and better feature subsets than benchmarks. Most importantly, the identified genetic loci and heterogeneous endophenotypic markers showed high relevance. Therefore, jointly using <strong>multi-omic endophenotypes</strong> and their CEP associations is promising to reveal genetic risk factors. The source code and manual of inMTSCCA are available at <span>https://ngdc.cncb.ac.cn/biocode/tools/BT007330</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"21 2","pages":"Pages 396-413"},"PeriodicalIF":9.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10126781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
iHypoxia: An Integrative Database of Protein Expression Dynamics in Response to Hypoxia in Animals iHypoxia:动物缺氧反应中蛋白质表达动力学的综合数据库。
IF 9.5 2区 生物学 Q1 GENETICS & HEREDITY Pub Date : 2023-04-01 DOI: 10.1016/j.gpb.2022.12.001
Ze-Xian Liu , Panqin Wang , Qingfeng Zhang , Shihua Li , Yuxin Zhang , Yutong Guo , Chongchong Jia , Tian Shao , Lin Li , Han Cheng , Zhenlong Wang

Mammals have evolved mechanisms to sense hypoxia and induce hypoxic responses. Recently, high-throughput techniques have greatly promoted global studies of protein expression changes during hypoxia and the identification of candidate genes associated with hypoxia-adaptive evolution, which have contributed to the understanding of the complex regulatory networks of hypoxia. In this study, we developed an integrated resource for the expression dynamics of proteins in response to hypoxia (iHypoxia), and this database contains 2589 expression events of 1944 proteins identified by low-throughput experiments (LTEs) and 422,553 quantitative expression events of 33,559 proteins identified by high-throughput experiments from five mammals that exhibit a response to hypoxia. Various experimental details, such as the hypoxic experimental conditions, expression patterns, and sample types, were carefully collected and integrated. Furthermore, 8788 candidate genes from diverse species inhabiting low-oxygen environments were also integrated. In addition, we conducted an orthologous search and computationally identified 394,141 proteins that may respond to hypoxia among 48 animals. An enrichment analysis of human proteins identified from LTEs shows that these proteins are enriched in certain drug targets and cancer genes. Annotation of known posttranslational modification (PTM) sites in the proteins identified by LTEs reveals that these proteins undergo extensive PTMs, particularly phosphorylation, ubiquitination, and acetylation. iHypoxia provides a convenient and user-friendly method for users to obtain hypoxia-related information of interest. We anticipate that iHypoxia, which is freely accessible at https://ihypoxia.omicsbio.info, will advance the understanding of hypoxia and serve as a valuable data resource.

哺乳动物已经进化出感知缺氧和诱导缺氧反应的机制。近年来,高通量技术极大地促进了对缺氧过程中蛋白质表达变化的全球研究,以及与缺氧适应性进化相关的候选基因的鉴定,这有助于理解缺氧的复杂调控网络。在这项研究中,我们开发了一个针对缺氧反应的蛋白质表达动力学的综合资源(iHypoxia),该数据库包含通过低通量实验(LTEs)鉴定的1944种蛋白质的2589个表达事件和通过高通量实验鉴定的33559种蛋白质的422553个定量表达事件,这些蛋白质来自五种表现出缺氧反应的哺乳动物。仔细收集并整合了各种实验细节,如缺氧实验条件、表达模式和样本类型。此外,还整合了8788个来自低氧环境中不同物种的候选基因。此外,我们进行了同源搜索,并通过计算在48只动物中鉴定了394141种可能对缺氧有反应的蛋白质。从LTE中鉴定的人类蛋白质的富集分析表明,这些蛋白质在某些药物靶点和癌症基因中富集。对LTE鉴定的蛋白质中已知的翻译后修饰(PTM)位点的注释表明,这些蛋白质经历了广泛的PTM,特别是磷酸化、泛素化和乙酰化。iHypoxia为用户提供了一种方便、用户友好的方法来获取感兴趣的缺氧相关信息。我们预计iHypoxia可以在https://ihypoxia.omicsbio.info,将促进对缺氧的理解,并作为宝贵的数据资源。
{"title":"iHypoxia: An Integrative Database of Protein Expression Dynamics in Response to Hypoxia in Animals","authors":"Ze-Xian Liu ,&nbsp;Panqin Wang ,&nbsp;Qingfeng Zhang ,&nbsp;Shihua Li ,&nbsp;Yuxin Zhang ,&nbsp;Yutong Guo ,&nbsp;Chongchong Jia ,&nbsp;Tian Shao ,&nbsp;Lin Li ,&nbsp;Han Cheng ,&nbsp;Zhenlong Wang","doi":"10.1016/j.gpb.2022.12.001","DOIUrl":"10.1016/j.gpb.2022.12.001","url":null,"abstract":"<div><p>Mammals have evolved mechanisms to sense <strong>hypoxia</strong> and induce hypoxic responses. Recently, high-throughput techniques have greatly promoted global studies of protein expression changes during hypoxia and the identification of candidate genes associated with hypoxia-adaptive evolution, which have contributed to the understanding of the complex regulatory networks of hypoxia. In this study, we developed an integrated resource for the <strong>expression dynamics</strong> of proteins in response to hypoxia (iHypoxia), and this database contains 2589 expression events of 1944 proteins identified by <strong>low-throughput experiments</strong> (LTEs) and 422,553 quantitative expression events of 33,559 proteins identified by <strong>high-throughput experiments</strong> from five mammals that exhibit a response to hypoxia. Various experimental details, such as the hypoxic experimental conditions, expression patterns, and sample types, were carefully collected and integrated. Furthermore, 8788 candidate genes from diverse species inhabiting low-oxygen environments were also integrated. In addition, we conducted an orthologous search and computationally identified 394,141 proteins that may respond to hypoxia among 48 animals. An enrichment analysis of human proteins identified from LTEs shows that these proteins are enriched in certain drug targets and cancer genes. Annotation of known posttranslational modification (PTM) sites in the proteins identified by LTEs reveals that these proteins undergo extensive PTMs, particularly phosphorylation, ubiquitination, and acetylation. iHypoxia provides a convenient and user-friendly method for users to obtain hypoxia-related information of interest. We anticipate that iHypoxia, which is freely accessible at <span>https://ihypoxia.omicsbio.info</span><svg><path></path></svg>, will advance the understanding of hypoxia and serve as a valuable data resource.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"21 2","pages":"Pages 267-277"},"PeriodicalIF":9.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9738781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NetGO 3.0: Protein Language Model Improves Large-scale Functional Annotations NetGO 3.0:蛋白质语言模型改进了大规模功能注释。
IF 9.5 2区 生物学 Q1 GENETICS & HEREDITY Pub Date : 2023-04-01 DOI: 10.1016/j.gpb.2023.04.001
Shaojun Wang , Ronghui You , Yunjia Liu , Yi Xiong , Shanfeng Zhu

As one of the state-of-the-art automated function prediction (AFP) methods, NetGO 2.0 integrates multi-source information to improve the performance. However, it mainly utilizes the proteins with experimentally supported functional annotations without leveraging valuable information from a vast number of unannotated proteins. Recently, protein language models have been proposed to learn informative representations [e.g., Evolutionary Scale Modeling (ESM)-1b embedding] from protein sequences based on self-supervision. Here, we represented each protein by ESM-1b and used logistic regression (LR) to train a new model, LR-ESM, for AFP. The experimental results showed that LR-ESM achieved comparable performance with the best-performing component of NetGO 2.0. Therefore, by incorporating LR-ESM into NetGO 2.0, we developed NetGO 3.0 to improve the performance of AFP extensively. NetGO 3.0 is freely accessible at https://dmiip.sjtu.edu.cn/ng3.0.

作为最先进的自动函数预测(AFP)方法之一,NetGO 2.0集成了多源信息以提高性能。然而,它主要利用具有实验支持的功能注释的蛋白质,而没有利用来自大量未注释蛋白质的有价值信息。最近,蛋白质语言模型被提出来从基于自我监督的蛋白质序列中学习信息表示[例如,进化尺度建模(ESM)-1b嵌入]。在这里,我们用ESM-1b表示每种蛋白质,并使用逻辑回归(LR)来训练AFP的新模型LR-ESM。实验结果表明,LR-ESM的性能与性能最好的NetGO 2.0组件相当。因此,通过将LR-ESM纳入NetGO 2.0,我们开发了NetGO 3.0,以广泛提高AFP的性能。NetGO 3.0可在https://dmiip.sjtu.edu.cn/ng3.0.
{"title":"NetGO 3.0: Protein Language Model Improves Large-scale Functional Annotations","authors":"Shaojun Wang ,&nbsp;Ronghui You ,&nbsp;Yunjia Liu ,&nbsp;Yi Xiong ,&nbsp;Shanfeng Zhu","doi":"10.1016/j.gpb.2023.04.001","DOIUrl":"10.1016/j.gpb.2023.04.001","url":null,"abstract":"<div><p>As one of the state-of-the-art automated function prediction (AFP) methods, NetGO 2.0 integrates multi-source information to improve the performance. However, it mainly utilizes the proteins with experimentally supported functional annotations without leveraging valuable information from a vast number of unannotated proteins. Recently, <strong>protein language models</strong> have been proposed to learn informative representations [<em>e.g.</em>, Evolutionary Scale Modeling (ESM)-1b embedding] from protein sequences based on self-supervision. Here, we represented each protein by ESM-1b and used logistic regression (LR) to train a new model, LR-ESM, for AFP. The experimental results showed that LR-ESM achieved comparable performance with the best-performing component of NetGO 2.0. Therefore, by incorporating LR-ESM into NetGO 2.0, we developed NetGO 3.0 to improve the performance of AFP extensively. NetGO 3.0 is freely accessible at <span>https://dmiip.sjtu.edu.cn/ng3.0</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"21 2","pages":"Pages 349-358"},"PeriodicalIF":9.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10021973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Genomics, Proteomics & Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1