In recent years, neoantigens have been recognized as ideal targets for tumor immunotherapy. With the development of neoantigen-based tumor immunotherapy, comprehensive neoantigen databases are urgently needed to meet the growing demand for clinical studies. We have built the tumor-specific neoantigen database (TSNAdb) previously, which has attracted much attention. In this study, we provide TSNAdb v2.0, an updated version of the TSNAdb. TSNAdb v2.0 offers several new features, including (1) adopting more stringent criteria for neoantigen identification, (2) providing predicted neoantigens derived from three types of somatic mutations, and (3) collecting experimentally validated neoantigens and dividing them according to the experimental level. TSNAdb v2.0 is freely available at https://pgx.zju.edu.cn/tsnadb/.
{"title":"TSNAdb v2.0: The Updated Version of Tumor-specific Neoantigen Database","authors":"Jingcheng Wu , Wenfan Chen , Yuxuan Zhou , Ying Chi , Xiansheng Hua , Jian Wu , Xun Gu , Shuqing Chen , Zhan Zhou","doi":"10.1016/j.gpb.2022.09.012","DOIUrl":"10.1016/j.gpb.2022.09.012","url":null,"abstract":"<div><p>In recent years, <strong>neoantigens</strong> have been recognized as ideal targets for <strong>tumor immunotherapy</strong>. With the development of neoantigen-based tumor immunotherapy, comprehensive neoantigen <strong>databases</strong> are urgently needed to meet the growing demand for clinical studies. We have built the tumor-specific neoantigen database (TSNAdb) previously, which has attracted much attention. In this study, we provide TSNAdb v2.0, an updated version of the TSNAdb. TSNAdb v2.0 offers several new features, including (1) adopting more stringent criteria for neoantigen identification, (2) providing predicted neoantigens derived from three types of <strong>somatic mutations</strong>, and (3) collecting experimentally validated neoantigens and dividing them according to the experimental level. TSNAdb v2.0 is freely available at <span>https://pgx.zju.edu.cn/tsnadb/</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"21 2","pages":"Pages 259-266"},"PeriodicalIF":9.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9743390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-01DOI: 10.1016/j.gpb.2023.01.001
Jian Li , Weicheng Liang , Zaozao Chen , Xingyu Li , Pan Gu , Anna Liu , Pin Chen , Qiwei Li , Xueyin Mei , Jing Yang , Jun Liu , Lincao Jiang , Zhongze Gu
Organs-on-a-chip is a microfluidic microphysiological system that uses microfluidic technology to analyze the structure and function of living human cells at the tissue and organ levels in vitro. Organs-on-a-chip technology, as opposed to traditional two-dimensional cell culture and animal models, can more closely simulate pathologic and toxicologic interactions between different organs or tissues and reflect the collaborative response of multiple organs to drugs. Despite the fact that many organs-on-a-chip-related data have been published, none of the current databases have all of the following functions: searching, downloading, as well as analyzing data and results from the literature on organs-on-a-chip. Therefore, we created an organs-on-a-chip database (OOCDB) as a platform to integrate information about organs-on-a-chip from various sources, including literature, patents, raw data from microarray and transcriptome sequencing, several open-access datasets of organs-on-a-chip and organoids, and data generated in our laboratory. OOCDB contains dozens of sub-databases and analysis tools, and each sub-database contains various data associated with organs-on-a-chip, with the goal of providing researchers with a comprehensive, systematic, and convenient search engine. Furthermore, it offers a variety of other functions, such as mathematical modeling, three-dimensional modeling, and citation mapping, to meet the needs of researchers and promote the development of organs-on-a-chip. The OOCDB is available at http://www.organchip.cn.
{"title":"OOCDB: A Comprehensive, Systematic, and Real-time Organs-on-a-chip Database","authors":"Jian Li , Weicheng Liang , Zaozao Chen , Xingyu Li , Pan Gu , Anna Liu , Pin Chen , Qiwei Li , Xueyin Mei , Jing Yang , Jun Liu , Lincao Jiang , Zhongze Gu","doi":"10.1016/j.gpb.2023.01.001","DOIUrl":"10.1016/j.gpb.2023.01.001","url":null,"abstract":"<div><p><strong>Organs-on-a-chip</strong> is a microfluidic microphysiological system that uses microfluidic technology to analyze the structure and function of living human cells at the tissue and <strong>organ</strong> levels <em>in vitro</em>. Organs-on-a-chip technology, as opposed to traditional two-dimensional cell culture and animal models, can more closely simulate pathologic and toxicologic interactions between different organs or tissues and reflect the collaborative response of multiple organs to drugs. Despite the fact that many organs-on-a-chip-related data have been published, none of the current <strong>databases</strong> have all of the following functions: searching, downloading, as well as analyzing data and results from the literature on organs-on-a-chip. Therefore, we created an organs-on-a-chip database (OOCDB) as a platform to integrate information about organs-on-a-chip from various sources, including literature, patents, raw data from microarray and transcriptome sequencing, several open-access datasets of organs-on-a-chip and organoids, and data generated in our laboratory. OOCDB contains dozens of sub-databases and analysis tools, and each sub-database contains various data associated with organs-on-a-chip, with the goal of providing researchers with a comprehensive, systematic, and convenient search engine. Furthermore, it offers a variety of other functions, such as <strong>mathematical modeling</strong>, three-dimensional modeling, and <strong>citation mapping</strong>, to meet the needs of researchers and promote the development of organs-on-a-chip. The OOCDB is available at <span>http://www.organchip.cn</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"21 2","pages":"Pages 243-258"},"PeriodicalIF":9.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9752429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-01DOI: 10.1016/j.gpb.2022.10.005
Ke Ding , Shanwen Sun , Yang Luo , Chaoyue Long , Jingwen Zhai , Yixiao Zhai , Guohua Wang
Chromatin accessibility landscapes are essential for detecting regulatory elements, illustrating the corresponding regulatory networks, and, ultimately, understanding the molecular basis underlying key biological processes. With the advancement of sequencing technologies, a large volume of chromatin accessibility data has been accumulated and integrated for humans and other mammals. These data have greatly advanced the study of disease pathogenesis, cancer survival prognosis, and tissue development. To advance the understanding of molecular mechanisms regulating plant key traits and biological processes, we developed a comprehensive plant chromatin accessibility database (PlantCADB) from 649 samples of 37 species. These samples are abiotic stress-related (such as heat, cold, drought, and salt; 159 samples), development-related (232 samples), and/or tissue-specific (376 samples). Overall, 18,339,426 accessible chromatin regions (ACRs) were compiled. These ACRs were annotated with genomic information, associated genes, transcription factor footprint, motif, and single-nucleotide polymorphisms (SNPs). Additionally, PlantCADB provides various tools to visualize ACRs and corresponding annotations. It thus forms an integrated, annotated, and analyzed plant-related chromatin accessibility resource, which can aid in better understanding genetic regulatory networks underlying development, important traits, stress adaptations, and evolution. PlantCADB is freely available at https://bioinfor.nefu.edu.cn/PlantCADB/.
{"title":"PlantCADB: A Comprehensive Plant Chromatin Accessibility Database","authors":"Ke Ding , Shanwen Sun , Yang Luo , Chaoyue Long , Jingwen Zhai , Yixiao Zhai , Guohua Wang","doi":"10.1016/j.gpb.2022.10.005","DOIUrl":"10.1016/j.gpb.2022.10.005","url":null,"abstract":"<div><p><strong>Chromatin accessibility</strong> landscapes are essential for detecting regulatory elements, illustrating the corresponding <strong>regulatory networks</strong>, and, ultimately, understanding the molecular basis underlying key biological processes. With the advancement of sequencing technologies, a large volume of chromatin accessibility data has been accumulated and integrated for humans and other mammals. These data have greatly advanced the study of disease pathogenesis, cancer survival prognosis, and tissue development. To advance the understanding of molecular mechanisms regulating <strong>plant</strong> key traits and biological processes, we developed a comprehensive plant chromatin accessibility database (PlantCADB) from 649 samples of 37 species. These samples are abiotic stress-related (such as heat, cold, drought, and salt; 159 samples), development-related (232 samples), and/or tissue-specific (376 samples). Overall, 18,339,426 accessible chromatin regions (ACRs) were compiled. These ACRs were annotated with genomic information, associated genes, <strong>transcription factor footprint</strong>, motif, and single-nucleotide polymorphisms (SNPs). Additionally, PlantCADB provides various tools to visualize ACRs and corresponding annotations. It thus forms an integrated, annotated, and analyzed plant-related chromatin accessibility resource, which can aid in better understanding genetic regulatory networks underlying development, important traits, stress adaptations, and evolution.<!--> <!-->PlantCADB is freely available at <span>https://bioinfor.nefu.edu.cn/PlantCADB/</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"21 2","pages":"Pages 311-323"},"PeriodicalIF":9.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9767481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-01DOI: 10.1016/j.gpb.2022.10.004
Yihe Pang , Bin Liu
Disordered flexible linkers (DFLs) are the functional disordered regions in proteins, which are the sub-regions of intrinsically disordered regions (IDRs) and play important roles in connecting domains and maintaining inter-domain interactions. Trained with the limited available DFLs, the existing DFL predictors based on the machine learning techniques tend to predict the ordered residues as DFLs, leading to a high falsepositive rate (FPR) and low prediction accuracy. Previous studies have shown that DFLs are extremely flexible disordered regions, which are usually predicted as disordered residues with high confidence [P(D) > 0.9] by an IDR predictor. Therefore, transferring an IDR predictor to an accurate DFL predictor is of great significance for understanding the functions of IDRs. In this study, we proposed a new predictor called TransDFL for identifying DFLs by transferring the RFPR-IDP predictor for IDR identification to the DFL prediction. The RFPR-IDP was pre-trained with IDR sequences to learn the general features between IDRs and DFLs, which is helpful to reduce the false positives in the ordered regions. RFPR-IDP was fine-tuned with the DFL sequences to capture the specific features of DFLs so as to be transferred into the TransDFL. Experimental results of two application scenarios (prediction of DFLs only in IDRs or prediction of DFLs in entire proteins) showed that TransDFL consistently outperformed other existing DFL predictors with higher accuracy. The corresponding web server of TransDFL can be freely accessed at http://bliulab.net/TransDFL/.
{"title":"TransDFL: Identification of Disordered Flexible Linkers in Proteins by Transfer Learning","authors":"Yihe Pang , Bin Liu","doi":"10.1016/j.gpb.2022.10.004","DOIUrl":"10.1016/j.gpb.2022.10.004","url":null,"abstract":"<div><p><strong>Disordered flexible linkers</strong> (DFLs) are the functional disordered regions in proteins, which are the sub-regions of intrinsically disordered regions (IDRs) and play important roles in connecting domains and maintaining inter-domain interactions. Trained with the limited available DFLs, the existing DFL predictors based on the machine learning techniques tend to predict the ordered residues as DFLs, leading to a high <strong>false</strong> <strong>positive rate</strong> (FPR) and low prediction accuracy. Previous studies have shown that DFLs are extremely flexible disordered regions, which are usually predicted as disordered residues with high confidence [<em>P</em>(<em>D</em>) > 0.9] by an IDR predictor. Therefore, transferring an IDR predictor to an accurate DFL predictor is of great significance for understanding the functions of IDRs. In this study, we proposed a new predictor called TransDFL for identifying DFLs by transferring the RFPR-IDP predictor for IDR identification to the DFL prediction. The RFPR-IDP was pre-trained with IDR sequences to learn the general features between IDRs and DFLs, which is helpful to reduce the false positives in the ordered regions. RFPR-IDP was fine-tuned with the DFL sequences to capture the specific features of DFLs so as to be transferred into the TransDFL. Experimental results of two application scenarios (prediction of DFLs only in IDRs or prediction of DFLs in entire proteins) showed that TransDFL consistently outperformed other existing DFL predictors with higher accuracy. The corresponding web server of TransDFL can be freely accessed at <span>http://bliulab.net/TransDFL/</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"21 2","pages":"Pages 359-369"},"PeriodicalIF":9.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10354923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-01DOI: 10.1016/j.gpb.2022.07.005
Shi-Yuan Tong , Ke Fan , Zai-Wei Zhou , Lin-Yun Liu , Shu-Qing Zhang , Yinghui Fu , Guang-Zhong Wang , Ying Zhu , Yong-Chun Yu
Next-generation sequencing technologies both boost the discovery of variants in the human genome and exacerbate the challenges of pathogenic variant identification. In this study, we developed Pathogenicity Prediction Tool for missense variants (mvPPT), a highly sensitive and accurate missense variant classifier based on gradient boosting. mvPPT adopts high-confidence training sets with a wide spectrum of variant profiles, and extracts three categories of features, including scores from existing prediction tools, frequencies (allele frequencies, amino acid frequencies, and genotype frequencies), and genomic context. Compared with established predictors, mvPPT achieves superior performance in all test sets, regardless of data source. In addition, our study also provides guidance for training set and feature selection strategies, as well as reveals highly relevant features, which may further provide biological insights into variant pathogenicity. mvPPT is freely available at http://www.mvppt.club/.
{"title":"mvPPT: A Highly Efficient and Sensitive Pathogenicity Prediction Tool for Missense Variants","authors":"Shi-Yuan Tong , Ke Fan , Zai-Wei Zhou , Lin-Yun Liu , Shu-Qing Zhang , Yinghui Fu , Guang-Zhong Wang , Ying Zhu , Yong-Chun Yu","doi":"10.1016/j.gpb.2022.07.005","DOIUrl":"10.1016/j.gpb.2022.07.005","url":null,"abstract":"<div><p>Next-generation sequencing technologies both boost the discovery of variants in the human genome and exacerbate the challenges of pathogenic variant identification. In this study, we developed <strong>Pathogenicity Prediction</strong> Tool for <strong>missense variants</strong> (mvPPT), a highly sensitive and accurate missense variant classifier based on gradient boosting. mvPPT adopts high-confidence training sets with a wide spectrum of variant profiles, and extracts three categories of features, including scores from existing prediction tools, frequencies (allele frequencies, amino acid frequencies, and genotype frequencies), and genomic context. Compared with established predictors, mvPPT achieves superior performance in all test sets, regardless of data source. In addition, our study also provides guidance for training set and feature selection strategies, as well as reveals highly relevant features, which may further provide biological insights into variant pathogenicity. mvPPT is freely available at <span>http://www.mvppt.club/</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"21 2","pages":"Pages 414-426"},"PeriodicalIF":9.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10043480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-01DOI: 10.1016/j.gpb.2022.05.001
Jianjian Wang , Shuang Li , Tianfeng Wang , Si Xu , Xu Wang , Xiaotong Kong , Xiaoyu Lu , Huixue Zhang , Lifang Li , Meng Feng , Shangwei Ning , Lihua Wang
Non-coding RNAs (ncRNAs), such as microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and circular RNAs (circRNAs), have emerged as important regulators of the immune system and are involved in the control of immune cell biology, disease pathogenesis, as well as vaccine responses. A repository of ncRNA–immune associations will facilitate our understanding of ncRNA-dependent mechanisms in the immune system and advance the development of therapeutics and prevention for immune disorders. Here, we describe a comprehensive database, RNA2Immune, which aims to provide a high-quality resource of experimentally supported database linking ncRNA regulatory mechanisms to immune cell function, immune disease, cancer immunology, and vaccines. The current version of RNA2Immune documents 50,433 immune–ncRNA associations in 42 host species, including (1) 6690 ncRNA associations with immune functions involving 31 immune cell types; (2) 38,672 ncRNA associations with 348 immune diseases; (3) 4833 ncRNA associations with cancer immunology; and (4) 238 ncRNA associations with vaccine responses involving 26 vaccine types targeting 22 diseases. RNA2Immune provides a user-friendly interface for browsing, searching, and downloading ncRNA–immune system associations. Collectively, RNA2Immune provides important information about how ncRNAs influence immune cell function, how dysregulation of these ncRNAs leads to pathological consequences (immune diseases and cancers), and how ncRNAs affect immune responses to vaccines. RNA2Immune is available at http://bio-bigdata.hrbmu.edu.cn/rna2immune/home.jsp.
{"title":"RNA2Immune: A Database of Experimentally Supported Data Linking Non-coding RNA Regulation to The Immune System","authors":"Jianjian Wang , Shuang Li , Tianfeng Wang , Si Xu , Xu Wang , Xiaotong Kong , Xiaoyu Lu , Huixue Zhang , Lifang Li , Meng Feng , Shangwei Ning , Lihua Wang","doi":"10.1016/j.gpb.2022.05.001","DOIUrl":"10.1016/j.gpb.2022.05.001","url":null,"abstract":"<div><p>Non-coding RNAs (<strong>ncRNAs</strong>), such as microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and circular RNAs (circRNAs), have emerged as important regulators of the immune system and are involved in the control of immune cell biology, disease pathogenesis, as well as <strong>vaccine</strong> responses. A repository of ncRNA–immune associations will facilitate our understanding of ncRNA-dependent mechanisms in the immune system and advance the development of therapeutics and prevention for immune disorders. Here, we describe a comprehensive database, RNA2Immune, which aims to provide a high-quality resource of experimentally supported database linking ncRNA regulatory mechanisms to immune cell function, <strong>immune disease</strong>, <strong>cancer immunology</strong>, and vaccines. The current version of RNA2Immune documents 50,433 immune–ncRNA associations in 42 host species, including (1) 6690 ncRNA associations with immune functions involving 31 immune cell types; (2) 38,672 ncRNA associations with 348 immune diseases; (3) 4833 ncRNA associations with cancer immunology; and (4) 238 ncRNA associations with vaccine responses involving 26 vaccine types targeting 22 diseases. RNA2Immune provides a user-friendly interface for browsing, searching, and downloading ncRNA–immune system associations. Collectively, RNA2Immune provides important information about how ncRNAs influence immune cell function, how dysregulation of these ncRNAs leads to pathological consequences (immune diseases and cancers), and how ncRNAs affect immune responses to vaccines. RNA2Immune is available at <span>http://bio-bigdata.hrbmu.edu.cn/rna2immune/home.jsp</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"21 2","pages":"Pages 283-291"},"PeriodicalIF":9.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10143159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-01DOI: 10.1016/j.gpb.2022.10.003
Tong Tang , Xingyun Liu , Rongrong Wu , Li Shen , Shumin Ren , Bairong Shen
Cancer therapy resistance and recurrence (CTRR) are the dominant causes of death in cancer patients. Recent studies have indicated that non-coding RNAs (ncRNAs) can not only reverse the resistance to cancer therapy but also are crucial biomarkers for the evaluation and prediction of CTRR. Herein, we developed CTRR-ncRNA, a knowledgebase of CTRR-associated ncRNAs, aiming to provide an accurate and comprehensive resource for research involving the association between CTRR and ncRNAs. Compared to most of the existing cancer databases, CTRR-ncRNA is focused on the clinical characterization of cancers, including cancer subtypes, as well as survival outcomes and responses to personalized therapy of cancer patients. Information pertaining to biomarker ncRNAs has also been documented for the development of personalized CTRR prediction. A user-friendly interface and several functional modules have been incorporated into the database. Based on the preliminary analysis of genotype–phenotype relationships, universal ncRNAs have been found to be potential biomarkers for CTRR. The CTRR-ncRNA is a translation-oriented knowledgebase and it provides a valuable resource for mechanistic investigations and explainable artificial intelligence-based modeling. CTRR-ncRNA is freely available to the public at http://ctrr.bioinf.org.cn/.
{"title":"CTRR-ncRNA: A Knowledgebase for Cancer Therapy Resistance and Recurrence Associated Non-coding RNAs","authors":"Tong Tang , Xingyun Liu , Rongrong Wu , Li Shen , Shumin Ren , Bairong Shen","doi":"10.1016/j.gpb.2022.10.003","DOIUrl":"10.1016/j.gpb.2022.10.003","url":null,"abstract":"<div><p>Cancer therapy resistance and recurrence (CTRR) are the dominant causes of death in cancer patients. Recent studies have indicated that <strong>non-coding RNAs</strong> (ncRNAs) can not only reverse the resistance to cancer therapy but also are crucial biomarkers for the evaluation and prediction of CTRR. Herein, we developed CTRR-ncRNA, a <strong>knowledgebase</strong> of CTRR-associated ncRNAs, aiming to provide an accurate and comprehensive resource for research involving the association between CTRR and ncRNAs. Compared to most of the existing cancer databases, CTRR-ncRNA is focused on the clinical characterization of cancers, including cancer subtypes, as well as survival outcomes and responses to personalized therapy of cancer patients. Information pertaining to biomarker ncRNAs has also been documented for the development of personalized CTRR prediction. A user-friendly interface and several functional modules have been incorporated into the database. Based on the preliminary analysis of genotype–phenotype relationships, universal ncRNAs have been found to be potential biomarkers for CTRR. The CTRR-ncRNA is a translation-oriented knowledgebase and it provides a valuable resource for mechanistic investigations and explainable artificial intelligence-based modeling. CTRR-ncRNA is freely available to the public at <span>http://ctrr.bioinf.org.cn/</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"21 2","pages":"Pages 292-299"},"PeriodicalIF":9.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9776047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-01DOI: 10.1016/j.gpb.2023.03.005
Lei Du, Jin Zhang, Ying Zhao, Muheng Shang, Lei Guo, Junwei Han, The Alzheimer's Disease Neuroimaging Initiative
Identifying genetic risk factors for Alzheimer’s disease (AD) is an important research topic. To date, different endophenotypes, such as imaging-derived endophenotypes and proteomic expression-derived endophenotypes, have shown the great value in uncovering risk genes compared to case–control studies. Biologically, a co-varying pattern of different omics-derived endophenotypes could result from the shared genetic basis. However, existing methods mainly focus on the effect of endophenotypes alone; the effect of cross-endophenotype (CEP) associations remains largely unexploited. In this study, we used both endophenotypes and their CEP associations of multi-omic data to identify genetic risk factors, and proposed two integrated multi-task sparse canonical correlation analysis (inMTSCCA) methods, i.e., pairwise endophenotype correlation-guided MTSCCA (pcMTSCCA) and high-order endophenotype correlation-guided MTSCCA (hocMTSCCA). pcMTSCCA employed pairwise correlations between magnetic resonance imaging (MRI)-derived, plasma-derived, and cerebrospinal fluid (CSF)-derived endophenotypes as an additional penalty. hocMTSCCA used high-order correlations among these multi-omic data for regularization. To figure out genetic risk factors at individual and group levels, as well as altered endophenotypic markers, we introduced sparsity-inducing penalties for both models. We compared pcMTSCCA and hocMTSCCA with three related methods on both simulation and real (consisting of neuroimaging data, proteomic analytes, and genetic data) datasets. The results showed that our methods obtained better or comparable canonical correlation coefficients (CCCs) and better feature subsets than benchmarks. Most importantly, the identified genetic loci and heterogeneous endophenotypic markers showed high relevance. Therefore, jointly using multi-omic endophenotypes and their CEP associations is promising to reveal genetic risk factors. The source code and manual of inMTSCCA are available at https://ngdc.cncb.ac.cn/biocode/tools/BT007330.
{"title":"inMTSCCA: An Integrated Multi-task Sparse Canonical Correlation Analysis for Multi-omic Brain Imaging Genetics","authors":"Lei Du, Jin Zhang, Ying Zhao, Muheng Shang, Lei Guo, Junwei Han, The Alzheimer's Disease Neuroimaging Initiative","doi":"10.1016/j.gpb.2023.03.005","DOIUrl":"10.1016/j.gpb.2023.03.005","url":null,"abstract":"<div><p>Identifying <strong>genetic risk factors</strong> for Alzheimer’s disease (AD) is an important research topic. To date, different endophenotypes, such as imaging-derived endophenotypes and proteomic expression-derived endophenotypes, have shown the great value in uncovering risk genes compared to case–control studies. Biologically, a co-varying pattern of different omics-derived endophenotypes could result from the shared genetic basis. However, existing methods mainly focus on the effect of endophenotypes alone; the effect of <strong>cross-endophenotype</strong> (CEP) associations remains largely unexploited. In this study, we used both endophenotypes and their CEP associations of multi-omic data to identify genetic risk factors, and proposed two integrated multi-task sparse canonical correlation analysis (inMTSCCA) methods, <em>i.e.</em>, pairwise endophenotype correlation-guided MTSCCA (<em>pc</em>MTSCCA) and high-order endophenotype correlation-guided MTSCCA (<em>hoc</em>MTSCCA). <em>pc</em>MTSCCA employed pairwise correlations between magnetic resonance imaging (MRI)-derived, plasma-derived, and cerebrospinal fluid (CSF)-derived endophenotypes as an additional penalty. <em>hoc</em>MTSCCA used high-order correlations among these multi-omic data for regularization. To figure out genetic risk factors at individual and group levels, as well as altered endophenotypic markers, we introduced sparsity-inducing penalties for both models. We compared <em>pc</em>MTSCCA and <em>hoc</em>MTSCCA with three related methods on both simulation and real (consisting of neuroimaging data, proteomic analytes, and genetic data) datasets. The results showed that our methods obtained better or comparable canonical correlation coefficients (CCCs) and better feature subsets than benchmarks. Most importantly, the identified genetic loci and heterogeneous endophenotypic markers showed high relevance. Therefore, jointly using <strong>multi-omic endophenotypes</strong> and their CEP associations is promising to reveal genetic risk factors. The source code and manual of inMTSCCA are available at <span>https://ngdc.cncb.ac.cn/biocode/tools/BT007330</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"21 2","pages":"Pages 396-413"},"PeriodicalIF":9.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10126781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-01DOI: 10.1016/j.gpb.2022.12.001
Ze-Xian Liu , Panqin Wang , Qingfeng Zhang , Shihua Li , Yuxin Zhang , Yutong Guo , Chongchong Jia , Tian Shao , Lin Li , Han Cheng , Zhenlong Wang
Mammals have evolved mechanisms to sense hypoxia and induce hypoxic responses. Recently, high-throughput techniques have greatly promoted global studies of protein expression changes during hypoxia and the identification of candidate genes associated with hypoxia-adaptive evolution, which have contributed to the understanding of the complex regulatory networks of hypoxia. In this study, we developed an integrated resource for the expression dynamics of proteins in response to hypoxia (iHypoxia), and this database contains 2589 expression events of 1944 proteins identified by low-throughput experiments (LTEs) and 422,553 quantitative expression events of 33,559 proteins identified by high-throughput experiments from five mammals that exhibit a response to hypoxia. Various experimental details, such as the hypoxic experimental conditions, expression patterns, and sample types, were carefully collected and integrated. Furthermore, 8788 candidate genes from diverse species inhabiting low-oxygen environments were also integrated. In addition, we conducted an orthologous search and computationally identified 394,141 proteins that may respond to hypoxia among 48 animals. An enrichment analysis of human proteins identified from LTEs shows that these proteins are enriched in certain drug targets and cancer genes. Annotation of known posttranslational modification (PTM) sites in the proteins identified by LTEs reveals that these proteins undergo extensive PTMs, particularly phosphorylation, ubiquitination, and acetylation. iHypoxia provides a convenient and user-friendly method for users to obtain hypoxia-related information of interest. We anticipate that iHypoxia, which is freely accessible at https://ihypoxia.omicsbio.info, will advance the understanding of hypoxia and serve as a valuable data resource.
{"title":"iHypoxia: An Integrative Database of Protein Expression Dynamics in Response to Hypoxia in Animals","authors":"Ze-Xian Liu , Panqin Wang , Qingfeng Zhang , Shihua Li , Yuxin Zhang , Yutong Guo , Chongchong Jia , Tian Shao , Lin Li , Han Cheng , Zhenlong Wang","doi":"10.1016/j.gpb.2022.12.001","DOIUrl":"10.1016/j.gpb.2022.12.001","url":null,"abstract":"<div><p>Mammals have evolved mechanisms to sense <strong>hypoxia</strong> and induce hypoxic responses. Recently, high-throughput techniques have greatly promoted global studies of protein expression changes during hypoxia and the identification of candidate genes associated with hypoxia-adaptive evolution, which have contributed to the understanding of the complex regulatory networks of hypoxia. In this study, we developed an integrated resource for the <strong>expression dynamics</strong> of proteins in response to hypoxia (iHypoxia), and this database contains 2589 expression events of 1944 proteins identified by <strong>low-throughput experiments</strong> (LTEs) and 422,553 quantitative expression events of 33,559 proteins identified by <strong>high-throughput experiments</strong> from five mammals that exhibit a response to hypoxia. Various experimental details, such as the hypoxic experimental conditions, expression patterns, and sample types, were carefully collected and integrated. Furthermore, 8788 candidate genes from diverse species inhabiting low-oxygen environments were also integrated. In addition, we conducted an orthologous search and computationally identified 394,141 proteins that may respond to hypoxia among 48 animals. An enrichment analysis of human proteins identified from LTEs shows that these proteins are enriched in certain drug targets and cancer genes. Annotation of known posttranslational modification (PTM) sites in the proteins identified by LTEs reveals that these proteins undergo extensive PTMs, particularly phosphorylation, ubiquitination, and acetylation. iHypoxia provides a convenient and user-friendly method for users to obtain hypoxia-related information of interest. We anticipate that iHypoxia, which is freely accessible at <span>https://ihypoxia.omicsbio.info</span><svg><path></path></svg>, will advance the understanding of hypoxia and serve as a valuable data resource.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"21 2","pages":"Pages 267-277"},"PeriodicalIF":9.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9738781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-01DOI: 10.1016/j.gpb.2023.04.001
Shaojun Wang , Ronghui You , Yunjia Liu , Yi Xiong , Shanfeng Zhu
As one of the state-of-the-art automated function prediction (AFP) methods, NetGO 2.0 integrates multi-source information to improve the performance. However, it mainly utilizes the proteins with experimentally supported functional annotations without leveraging valuable information from a vast number of unannotated proteins. Recently, protein language models have been proposed to learn informative representations [e.g., Evolutionary Scale Modeling (ESM)-1b embedding] from protein sequences based on self-supervision. Here, we represented each protein by ESM-1b and used logistic regression (LR) to train a new model, LR-ESM, for AFP. The experimental results showed that LR-ESM achieved comparable performance with the best-performing component of NetGO 2.0. Therefore, by incorporating LR-ESM into NetGO 2.0, we developed NetGO 3.0 to improve the performance of AFP extensively. NetGO 3.0 is freely accessible at https://dmiip.sjtu.edu.cn/ng3.0.
{"title":"NetGO 3.0: Protein Language Model Improves Large-scale Functional Annotations","authors":"Shaojun Wang , Ronghui You , Yunjia Liu , Yi Xiong , Shanfeng Zhu","doi":"10.1016/j.gpb.2023.04.001","DOIUrl":"10.1016/j.gpb.2023.04.001","url":null,"abstract":"<div><p>As one of the state-of-the-art automated function prediction (AFP) methods, NetGO 2.0 integrates multi-source information to improve the performance. However, it mainly utilizes the proteins with experimentally supported functional annotations without leveraging valuable information from a vast number of unannotated proteins. Recently, <strong>protein language models</strong> have been proposed to learn informative representations [<em>e.g.</em>, Evolutionary Scale Modeling (ESM)-1b embedding] from protein sequences based on self-supervision. Here, we represented each protein by ESM-1b and used logistic regression (LR) to train a new model, LR-ESM, for AFP. The experimental results showed that LR-ESM achieved comparable performance with the best-performing component of NetGO 2.0. Therefore, by incorporating LR-ESM into NetGO 2.0, we developed NetGO 3.0 to improve the performance of AFP extensively. NetGO 3.0 is freely accessible at <span>https://dmiip.sjtu.edu.cn/ng3.0</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"21 2","pages":"Pages 349-358"},"PeriodicalIF":9.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10021973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}