Single-cell RNA sequencing (scRNA-seq) is revolutionizing the study of complex and dynamic cellular mechanisms. However, cell type annotation remains a main challenge as it largely relies on a priori knowledge and manual curation, which is cumbersome and subjective. The increasing number of scRNA-seq datasets, as well as numerous published genetic studies, has motivated us to build a comprehensive human cell type reference atlas. Here, we present decoding Cell type Specificity (deCS), an automatic cell type annotation method augmented by a comprehensive collection of human cell type expression profiles and marker genes. We used deCS to annotate scRNA-seq data from various tissue types and systematically evaluated the annotation accuracy under different conditions, including reference panels, sequencing depth, and feature selection strategies. Our results demonstrate that expanding the references is critical for improving annotation accuracy. Compared to many existing state-of-the-art annotation tools, deCS significantly reduced computation time and increased accuracy. deCS can be integrated into the standard scRNA-seq analytical pipeline to enhance cell type annotation. Finally, we demonstrated the broad utility of deCS to identify trait–cell type associations in 51 human complex traits, providing deep insights into the cellular mechanisms underlying disease pathogenesis. All documents for deCS, including source code, user manual, demo data, and tutorials, are freely available at https://github.com/bsml320/deCS.
Genetic and epigenetic changes after polyploidization events could result in variable gene expression and modified regulatory networks. Here, using large-scale transcriptome data, we constructed co-expression networks for diploid, tetraploid, and hexaploid wheat species, and built a platform for comparing co-expression networks of allohexaploid wheat and its progenitors, named WheatCENet. WheatCENet is a platform for searching and comparing specific functional co-expression networks, as well as identifying the related functions of the genes clustered therein. Functional annotations like pathways, gene families, protein–protein interactions, microRNAs (miRNAs), and several lines of epigenome data are integrated into this platform, and Gene Ontology (GO) annotation, gene set enrichment analysis (GSEA), motif identification, and other useful tools are also included. Using WheatCENet, we found that the network of WHEAT ABERRANT PANICLE ORGANIZATION 1 (WAPO1) has more co-expressed genes related to spike development in hexaploid wheat than its progenitors. We also found a novel motif of CCWWWWWWGG (CArG) specifically in the promoter region of WAPO-A1, suggesting that neofunctionalization of the WAPO-A1 gene affects spikelet development in hexaploid wheat. WheatCENet is useful for investigating co-expression networks and conducting other analyses, and thus facilitates comparative and functional genomic studies in wheat. WheatCENet is freely available at http://bioinformatics.cpolar.cn/WheatCENet and http://bioinformatics.cau.edu.cn/WheatCENet.
Integration of oncogenic DNA viruses into the human genome is a key step in most virus-induced carcinogenesis. Here, we constructed a virus integration site (VIS) Atlas database, an extensive collection of integration breakpoints for three most prevalent oncoviruses, human papillomavirus, hepatitis B virus, and Epstein–Barr virus based on the next-generation sequencing (NGS) data, literature, and experimental data. There are 63,179 breakpoints and 47,411 junctional sequences with full annotations deposited in the VIS Atlas database, comprising 47 virus genotypes and 17 disease types. The VIS Atlas database provides (1) a genome browser for NGS breakpoint quality check, visualization of VISs, and the local genomic context; (2) a novel platform to discover integration patterns; and (3) a statistics interface for a comprehensive investigation of genotype-specific integration features. Data collected in the VIS Atlas aid to provide insights into virus pathogenic mechanisms and the development of novel antitumor drugs. The VIS Atlas database is available at https://www.vis-atlas.tech/.
Non-coding genomic variants constitute the majority of trait-associated genome variations; however, the identification of functional non-coding variants is still a challenge in human genetics, and a method for systematically assessing the impact of regulatory variants on gene expression and linking these regulatory variants to potential target genes is still lacking. Here, we introduce a deep neural network (DNN)-based computational framework, RegVar, which can accurately predict the tissue-specific impact of non-coding regulatory variants on target genes. We show that by robustly learning the genomic characteristics of massive variant–gene expression associations in a variety of human tissues, RegVar vastly surpasses all current non-coding variant prioritization methods in predicting regulatory variants under different circumstances. The unique features of RegVar make it an excellent framework for assessing the regulatory impact of any variant on its putative target genes in a variety of tissues. RegVar is available as a web server at https://regvar.omic.tech/.
Ferroptosis is a form of regulated cell death driven by the accumulation of lipid hydroperoxides. Regulation of ferroptosis might be beneficial to cancer treatment. Non-coding RNAs (ncRNAs) are a class of RNA transcripts that generally cannot encode proteins and have been demonstrated to play critical roles in regulating ferroptosis. Herein, we developed ncFO, the ncRNA–ferroptosis association database, to document the manually curated and predicted ncRNAs that are associated with ferroptosis. Collectively, ncFO contains 90 experimentally verified entries, including 46 microRNAs (miRNAs), 21 long non-coding RNAs (lncRNAs), and 17 circular RNAs (circRNAs). In addition, ncFO also incorporates two online prediction tools based on the regulation and co-expression of ncRNA and ferroptosis genes. Using default parameters, we obtained 3260 predicted entries, including 598 miRNAs and 178 lncRNAs, by regulation, as well as 2,592,661 predicted entries, including 967 miRNAs and 9632 lncRNAs, by ncRNA–ferroptosis gene co-expression in more than 8000 samples across 20 cancer types. The detailed information of each entry includes ncRNA name, disease, species, tissue, target, regulation, publication time, and PubMed identifier. ncFO also provides survival analysis and differential expression analysis for ncRNAs. In summary, ncFO offers a user-friendly platform to search and predict ferroptosis-associated ncRNAs, which might facilitate research on ferroptosis and discover potential targets for cancer treatment. ncFO can be accessed at http://www.jianglab.cn/ncFO/.
In recent years, neoantigens have been recognized as ideal targets for tumor immunotherapy. With the development of neoantigen-based tumor immunotherapy, comprehensive neoantigen databases are urgently needed to meet the growing demand for clinical studies. We have built the tumor-specific neoantigen database (TSNAdb) previously, which has attracted much attention. In this study, we provide TSNAdb v2.0, an updated version of the TSNAdb. TSNAdb v2.0 offers several new features, including (1) adopting more stringent criteria for neoantigen identification, (2) providing predicted neoantigens derived from three types of somatic mutations, and (3) collecting experimentally validated neoantigens and dividing them according to the experimental level. TSNAdb v2.0 is freely available at https://pgx.zju.edu.cn/tsnadb/.
Organs-on-a-chip is a microfluidic microphysiological system that uses microfluidic technology to analyze the structure and function of living human cells at the tissue and organ levels in vitro. Organs-on-a-chip technology, as opposed to traditional two-dimensional cell culture and animal models, can more closely simulate pathologic and toxicologic interactions between different organs or tissues and reflect the collaborative response of multiple organs to drugs. Despite the fact that many organs-on-a-chip-related data have been published, none of the current databases have all of the following functions: searching, downloading, as well as analyzing data and results from the literature on organs-on-a-chip. Therefore, we created an organs-on-a-chip database (OOCDB) as a platform to integrate information about organs-on-a-chip from various sources, including literature, patents, raw data from microarray and transcriptome sequencing, several open-access datasets of organs-on-a-chip and organoids, and data generated in our laboratory. OOCDB contains dozens of sub-databases and analysis tools, and each sub-database contains various data associated with organs-on-a-chip, with the goal of providing researchers with a comprehensive, systematic, and convenient search engine. Furthermore, it offers a variety of other functions, such as mathematical modeling, three-dimensional modeling, and citation mapping, to meet the needs of researchers and promote the development of organs-on-a-chip. The OOCDB is available at http://www.organchip.cn.
Chromatin accessibility landscapes are essential for detecting regulatory elements, illustrating the corresponding regulatory networks, and, ultimately, understanding the molecular basis underlying key biological processes. With the advancement of sequencing technologies, a large volume of chromatin accessibility data has been accumulated and integrated for humans and other mammals. These data have greatly advanced the study of disease pathogenesis, cancer survival prognosis, and tissue development. To advance the understanding of molecular mechanisms regulating plant key traits and biological processes, we developed a comprehensive plant chromatin accessibility database (PlantCADB) from 649 samples of 37 species. These samples are abiotic stress-related (such as heat, cold, drought, and salt; 159 samples), development-related (232 samples), and/or tissue-specific (376 samples). Overall, 18,339,426 accessible chromatin regions (ACRs) were compiled. These ACRs were annotated with genomic information, associated genes, transcription factor footprint, motif, and single-nucleotide polymorphisms (SNPs). Additionally, PlantCADB provides various tools to visualize ACRs and corresponding annotations. It thus forms an integrated, annotated, and analyzed plant-related chromatin accessibility resource, which can aid in better understanding genetic regulatory networks underlying development, important traits, stress adaptations, and evolution. PlantCADB is freely available at https://bioinfor.nefu.edu.cn/PlantCADB/.