The expression of linear DNA sequence is precisely regulated by the three-dimensional (3D) architecture of chromatin. Morphine-induced aberrant gene networks of neurons have been extensively investigated; however, how morphine impacts the 3D genomic architecture of neurons is still unknown. Here, we applied digestion-ligation-only high-throughput chromosome conformation capture (DLO Hi-C) technology to investigate the effects of morphine on the 3D chromatin architecture of primate cortical neurons. After receiving continuous morphine administration for 90 days on rhesus monkeys, we discovered that morphine re-arranged chromosome territories, with a total of 391 segmented compartments being switched. Morphine altered over half of the detected topologically associated domains (TADs), most of which exhibited a variety of shifts, followed by separating and fusing types. Analysis of the looping events at kilobase-scale resolution revealed that morphine increased not only the number but also the length of differential loops. Moreover, all identified differentially expressed genes from the RNA sequencing data were mapped to the specific TAD boundaries or differential loops, and were further validated for changed expression. Collectively, an altered 3D genomic architecture of cortical neurons may regulate the gene networks associated with morphine effects. Our finding provides critical hubs connecting chromosome spatial organization and gene networks associated with the morphine effects in humans.
{"title":"Morphine Re-arranges Chromatin Spatial Architecture of Primate Cortical Neurons.","authors":"Liang Wang, Xiaojie Wang, Chunqi Liu, Wei Xu, Weihong Kuang, Qian Bu, Hongchun Li, Ying Zhao, Linhong Jiang, Yaxing Chen, Feng Qin, Shu Li, Qinfan Wei, Xiaocong Liu, Bin Liu, Yuanyuan Chen, Yanping Dai, Hongbo Wang, Jingwei Tian, Gang Cao, Yinglan Zhao, Xiaobo Cen","doi":"10.1016/j.gpb.2023.03.003","DOIUrl":"10.1016/j.gpb.2023.03.003","url":null,"abstract":"<p><p>The expression of linear DNA sequence is precisely regulated by the three-dimensional (3D) architecture of chromatin. Morphine-induced aberrant gene networks of neurons have been extensively investigated; however, how morphine impacts the 3D genomic architecture of neurons is still unknown. Here, we applied digestion-ligation-only high-throughput chromosome conformation capture (DLO Hi-C) technology to investigate the effects of morphine on the 3D chromatin architecture of primate cortical neurons. After receiving continuous morphine administration for 90 days on rhesus monkeys, we discovered that morphine re-arranged chromosome territories, with a total of 391 segmented compartments being switched. Morphine altered over half of the detected topologically associated domains (TADs), most of which exhibited a variety of shifts, followed by separating and fusing types. Analysis of the looping events at kilobase-scale resolution revealed that morphine increased not only the number but also the length of differential loops. Moreover, all identified differentially expressed genes from the RNA sequencing data were mapped to the specific TAD boundaries or differential loops, and were further validated for changed expression. Collectively, an altered 3D genomic architecture of cortical neurons may regulate the gene networks associated with morphine effects. Our finding provides critical hubs connecting chromosome spatial organization and gene networks associated with the morphine effects in humans.</p>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":null,"pages":null},"PeriodicalIF":11.5,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10787020/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9544973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The white-blotched river stingray (Potamotrygon leopoldi) is a cartilaginous fish native to the Xingu River, a tributary of the Amazon River system. As a rare freshwater-dwelling cartilaginous fish in the Potamotrygonidae family in which no member has the genome sequencing information available, P. leopoldi provides the evolutionary details in fish phylogeny, niche adaptation, and skeleton formation. In this study, we present its draft genome of 4.11 Gb comprising 16,227 contigs and 13,238 scaffolds, with contig N50 of 3937 kb and scaffold N50 of 5675 kb in size. Our analysis shows that P. leopoldi is a slow-evolving fish that diverged from elephant sharks about 96 million years ago. Moreover, two gene families related to the immune system (immunoglobulin heavy constant delta genes and T-cell receptor alpha/delta variable genes) exhibit expansion in P. leopoldi only. We also identified the Hox gene clusters in P. leopoldi and discovered that seven Hox genes shared by five representative fish species are missing in P. leopoldi. The RNA sequencing data from P. leopoldi and other three fish species demonstrate that fishes have a more diversified tissue expression spectrum when compared to mammals. Our functional studies suggest that lack of the gc gene encoding vitamin D-binding protein in cartilaginous fishes (both P. leopoldi and Callorhinchus milii) could partly explain the absence of hard bone in their endoskeleton. Overall, this genome resource provides new insights into the niche adaptation, body plan, and skeleton formation of P. leopoldi, as well as the genome evolution in cartilaginous fishes.
{"title":"Draft Genome of White-blotched River Stingray Provides Novel Clues for Niche Adaptation and Skeleton Formation.","authors":"Jingqi Zhou, Ake Liu, Funan He, Yunbin Zhang, Libing Shen, Jun Yu, Xiang Zhang","doi":"10.1016/j.gpb.2022.11.005","DOIUrl":"10.1016/j.gpb.2022.11.005","url":null,"abstract":"<p><p>The white-blotched river stingray (Potamotrygon leopoldi) is a cartilaginous fish native to the Xingu River, a tributary of the Amazon River system. As a rare freshwater-dwelling cartilaginous fish in the Potamotrygonidae family in which no member has the genome sequencing information available, P. leopoldi provides the evolutionary details in fish phylogeny, niche adaptation, and skeleton formation. In this study, we present its draft genome of 4.11 Gb comprising 16,227 contigs and 13,238 scaffolds, with contig N50 of 3937 kb and scaffold N50 of 5675 kb in size. Our analysis shows that P. leopoldi is a slow-evolving fish that diverged from elephant sharks about 96 million years ago. Moreover, two gene families related to the immune system (immunoglobulin heavy constant delta genes and T-cell receptor alpha/delta variable genes) exhibit expansion in P. leopoldi only. We also identified the Hox gene clusters in P. leopoldi and discovered that seven Hox genes shared by five representative fish species are missing in P. leopoldi. The RNA sequencing data from P. leopoldi and other three fish species demonstrate that fishes have a more diversified tissue expression spectrum when compared to mammals. Our functional studies suggest that lack of the gc gene encoding vitamin D-binding protein in cartilaginous fishes (both P. leopoldi and Callorhinchus milii) could partly explain the absence of hard bone in their endoskeleton. Overall, this genome resource provides new insights into the niche adaptation, body plan, and skeleton formation of P. leopoldi, as well as the genome evolution in cartilaginous fishes.</p>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":null,"pages":null},"PeriodicalIF":11.5,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10787021/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35255168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01Epub Date: 2021-12-07DOI: 10.1016/j.gpb.2021.10.003
Fang-Yuan Shi, Yu Wang, Dong Huang, Yu Liang, Nan Liang, Xiao-Wei Chen, Ge Gao
Large-scale genome-wide association studies (GWAS) and expression quantitative trait locus (eQTL) studies have identified multiple non-coding variants associated with genetic diseases by affecting gene expression. However, pinpointing causal variants effectively and efficiently remains a serious challenge. Here, we developed CARMEN, a novel algorithm to identify functional non-coding expression-modulating variants. Multiple evaluations demonstrated CARMEN's superior performance over state-of-the-art tools. Applying CARMEN to GWAS and eQTL datasets further pinpointed several causal variants other than the reported lead single-nucleotide polymorphisms (SNPs). CARMEN scales well with the massive datasets, and is available online as a web server at http://carmen.gao-lab.org.
{"title":"Computational Assessment of the Expression-modulating Potential for Non-coding Variants.","authors":"Fang-Yuan Shi, Yu Wang, Dong Huang, Yu Liang, Nan Liang, Xiao-Wei Chen, Ge Gao","doi":"10.1016/j.gpb.2021.10.003","DOIUrl":"10.1016/j.gpb.2021.10.003","url":null,"abstract":"<p><p>Large-scale genome-wide association studies (GWAS) and expression quantitative trait locus (eQTL) studies have identified multiple non-coding variants associated with genetic diseases by affecting gene expression. However, pinpointing causal variants effectively and efficiently remains a serious challenge. Here, we developed CARMEN, a novel algorithm to identify functional non-coding expression-modulating variants. Multiple evaluations demonstrated CARMEN's superior performance over state-of-the-art tools. Applying CARMEN to GWAS and eQTL datasets further pinpointed several causal variants other than the reported lead single-nucleotide polymorphisms (SNPs). CARMEN scales well with the massive datasets, and is available online as a web server at http://carmen.gao-lab.org.</p>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":null,"pages":null},"PeriodicalIF":11.5,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10787178/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39574450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01Epub Date: 2022-11-15DOI: 10.1016/j.gpb.2022.11.001
Ruobing Han, Lei Han, Xunwu Zhao, Qianghui Wang, Yanling Xia, Heping Li
Despite the scientific and medicinal importance of diploid sika deer (Cervus nippon), its genome resources are limited and haplotype-resolved chromosome-scale assembly is urgently needed. To explore mechanisms underlying the expression patterns of the allele-specific genes in antlers and the chromosome evolution in Cervidae, we report, for the first time, a high-quality haplotype-resolved chromosome-scale genome of sika deer by integrating multiple sequencing strategies, which was anchored to 32 homologous groups with a pair of sex chromosomes (XY). Several expanded genes (RET, PPP2R1A, PPP2R1B, YWHAB, YWHAZ, and RPS6) and positively selected genes (eIF4E, Wnt8A, Wnt9B, BMP4, and TP53) were identified, which could contribute to rapid antler growth without carcinogenesis. A comprehensive and systematic genome-wide analysis of allele expression patterns revealed that most alleles were functionally equivalent in regulating rapid antler growth and inhibiting oncogenesis. Comparative genomic analysis revealed that chromosome fission might occur during the divergence of sika deer and red deer (Cervus elaphus), and the olfactory sensation of sika deer might be more powerful than that of red deer. Obvious inversion regions containing olfactory receptor genes were also identified, which arose since the divergence. In conclusion, the high-quality allele-aware reference genome provides valuable resources for further illustration of the unique biological characteristics of antler, chromosome evolution, and multi-omics research of cervid animals.
{"title":"Haplotype-resolved Genome of Sika Deer Reveals Allele-specific Gene Expression and Chromosome Evolution.","authors":"Ruobing Han, Lei Han, Xunwu Zhao, Qianghui Wang, Yanling Xia, Heping Li","doi":"10.1016/j.gpb.2022.11.001","DOIUrl":"10.1016/j.gpb.2022.11.001","url":null,"abstract":"<p><p>Despite the scientific and medicinal importance of diploid sika deer (Cervus nippon), its genome resources are limited and haplotype-resolved chromosome-scale assembly is urgently needed. To explore mechanisms underlying the expression patterns of the allele-specific genes in antlers and the chromosome evolution in Cervidae, we report, for the first time, a high-quality haplotype-resolved chromosome-scale genome of sika deer by integrating multiple sequencing strategies, which was anchored to 32 homologous groups with a pair of sex chromosomes (XY). Several expanded genes (RET, PPP2R1A, PPP2R1B, YWHAB, YWHAZ, and RPS6) and positively selected genes (eIF4E, Wnt8A, Wnt9B, BMP4, and TP53) were identified, which could contribute to rapid antler growth without carcinogenesis. A comprehensive and systematic genome-wide analysis of allele expression patterns revealed that most alleles were functionally equivalent in regulating rapid antler growth and inhibiting oncogenesis. Comparative genomic analysis revealed that chromosome fission might occur during the divergence of sika deer and red deer (Cervus elaphus), and the olfactory sensation of sika deer might be more powerful than that of red deer. Obvious inversion regions containing olfactory receptor genes were also identified, which arose since the divergence. In conclusion, the high-quality allele-aware reference genome provides valuable resources for further illustration of the unique biological characteristics of antler, chromosome evolution, and multi-omics research of cervid animals.</p>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":null,"pages":null},"PeriodicalIF":11.5,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10787017/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40474858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01Epub Date: 2022-03-08DOI: 10.1016/j.gpb.2022.02.002
Zheng Wang, Guihu Zhao, Bin Li, Zhenghuan Fang, Qian Chen, Xiaomeng Wang, Tengfei Luo, Yijing Wang, Qiao Zhou, Kuokuo Li, Lu Xia, Yi Zhang, Xun Zhou, Hongxu Pan, Yuwen Zhao, Yige Wang, Lin Wang, Jifeng Guo, Beisha Tang, Kun Xia, Jinchen Li
Non-coding variants in the human genome significantly influence human traits and complex diseases via their regulation and modification effects. Hence, an increasing number of computational methods are developed to predict the effects of variants in human non-coding sequences. However, it is difficult for inexperienced users to select appropriate computational methods from dozens of available methods. To solve this issue, we assessed 12 performance metrics of 24 methods on four independent non-coding variant benchmark datasets: (1) rare germline variants from clinical relevant sequence variants (ClinVar), (2) rare somatic variants from Catalogue Of Somatic Mutations In Cancer (COSMIC), (3) common regulatory variants from curated expression quantitative trait locus (eQTL) data, and (4) disease-associated common variants from curated genome-wide association studies (GWAS). All 24 tested methods performed differently under various conditions, indicating varying strengths and weaknesses under different scenarios. Importantly, the performance of existing methods was acceptable for rare germline variants from ClinVar with the area under the receiver operating characteristic curve (AUROC) of 0.4481-0.8033 and poor for rare somatic variants from COSMIC (AUROC = 0.4984-0.7131), common regulatory variants from curated eQTL data (AUROC = 0.4837-0.6472), and disease-associated common variants from curated GWAS (AUROC = 0.4766-0.5188). We also compared the prediction performance of 24 methods for non-coding de novo mutations in autism spectrum disorder, and found that the combined annotation-dependent depletion (CADD) and context-dependent tolerance score (CDTS) methods showed better performance. Summarily, we assessed the performance of 24 computational methods under diverse scenarios, providing preliminary advice for proper tool selection and guiding the development of new techniques in interpreting non-coding variants.
{"title":"Performance Comparison of Computational Methods for the Prediction of the Function and Pathogenicity of Non-coding Variants.","authors":"Zheng Wang, Guihu Zhao, Bin Li, Zhenghuan Fang, Qian Chen, Xiaomeng Wang, Tengfei Luo, Yijing Wang, Qiao Zhou, Kuokuo Li, Lu Xia, Yi Zhang, Xun Zhou, Hongxu Pan, Yuwen Zhao, Yige Wang, Lin Wang, Jifeng Guo, Beisha Tang, Kun Xia, Jinchen Li","doi":"10.1016/j.gpb.2022.02.002","DOIUrl":"10.1016/j.gpb.2022.02.002","url":null,"abstract":"<p><p>Non-coding variants in the human genome significantly influence human traits and complex diseases via their regulation and modification effects. Hence, an increasing number of computational methods are developed to predict the effects of variants in human non-coding sequences. However, it is difficult for inexperienced users to select appropriate computational methods from dozens of available methods. To solve this issue, we assessed 12 performance metrics of 24 methods on four independent non-coding variant benchmark datasets: (1) rare germline variants from clinical relevant sequence variants (ClinVar), (2) rare somatic variants from Catalogue Of Somatic Mutations In Cancer (COSMIC), (3) common regulatory variants from curated expression quantitative trait locus (eQTL) data, and (4) disease-associated common variants from curated genome-wide association studies (GWAS). All 24 tested methods performed differently under various conditions, indicating varying strengths and weaknesses under different scenarios. Importantly, the performance of existing methods was acceptable for rare germline variants from ClinVar with the area under the receiver operating characteristic curve (AUROC) of 0.4481-0.8033 and poor for rare somatic variants from COSMIC (AUROC = 0.4984-0.7131), common regulatory variants from curated eQTL data (AUROC = 0.4837-0.6472), and disease-associated common variants from curated GWAS (AUROC = 0.4766-0.5188). We also compared the prediction performance of 24 methods for non-coding de novo mutations in autism spectrum disorder, and found that the combined annotation-dependent depletion (CADD) and context-dependent tolerance score (CDTS) methods showed better performance. Summarily, we assessed the performance of 24 computational methods under diverse scenarios, providing preliminary advice for proper tool selection and guiding the development of new techniques in interpreting non-coding variants.</p>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":null,"pages":null},"PeriodicalIF":11.5,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10787016/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41273277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01Epub Date: 2023-01-23DOI: 10.1016/j.gpb.2023.01.004
Yan Zhang, Jingwen Zhang, Wei Zhang, Mohan Wang, Shuangqi Wang, Yao Xu, Lun Zhao, Xingwang Li, Guoliang Li
Studies on the lung cancer genome are indispensable for developing a cure for lung cancer. Whole-genome resequencing, genome-wide association studies, and transcriptome sequencing have greatly improved our understanding of the cancer genome. However, dysregulation of long-range chromatin interactions in lung cancer remains poorly described. To better understand the three-dimensional (3D) genomic interaction features of the lung cancer genome, we used the A549 cell line as a model system and generated high-resolution chromatin interactions associated with RNA polymerase II (RNAPII), CCCTC-binding factor (CTCF), enhancer of zeste homolog 2 (EZH2), and histone 3 lysine 27 trimethylation (H3K27me3) using long-read chromatin interaction analysis by paired-end tag sequencing (ChIA-PET). Analysis showed that EZH2/H3K27me3-mediated interactions further repressed target genes, either through loops or domains, and their distributions along the genome were distinct from and complementary to those associated with RNAPII. Cancer-related genes were highly enriched with chromatin interactions, and chromatin interactions specific to the A549 cell line were associated with oncogenes and tumor suppressor genes, such as additional repressive interactions on FOXO4 and promoter-promoter interactions between NF1 and RNF135. Knockout of an anchor associated with chromatin interactions reversed the dysregulation of cancer-related genes, suggesting that chromatin interactions are essential for proper expression of lung cancer-related genes. These findings demonstrate the 3D landscape and gene regulatory relationships of the lung cancer genome.
{"title":"Mapping Multi-factor-mediated Chromatin Interactions to Assess Dysregulation of Lung Cancer-related Genes.","authors":"Yan Zhang, Jingwen Zhang, Wei Zhang, Mohan Wang, Shuangqi Wang, Yao Xu, Lun Zhao, Xingwang Li, Guoliang Li","doi":"10.1016/j.gpb.2023.01.004","DOIUrl":"10.1016/j.gpb.2023.01.004","url":null,"abstract":"<p><p>Studies on the lung cancer genome are indispensable for developing a cure for lung cancer. Whole-genome resequencing, genome-wide association studies, and transcriptome sequencing have greatly improved our understanding of the cancer genome. However, dysregulation of long-range chromatin interactions in lung cancer remains poorly described. To better understand the three-dimensional (3D) genomic interaction features of the lung cancer genome, we used the A549 cell line as a model system and generated high-resolution chromatin interactions associated with RNA polymerase II (RNAPII), CCCTC-binding factor (CTCF), enhancer of zeste homolog 2 (EZH2), and histone 3 lysine 27 trimethylation (H3K27me3) using long-read chromatin interaction analysis by paired-end tag sequencing (ChIA-PET). Analysis showed that EZH2/H3K27me3-mediated interactions further repressed target genes, either through loops or domains, and their distributions along the genome were distinct from and complementary to those associated with RNAPII. Cancer-related genes were highly enriched with chromatin interactions, and chromatin interactions specific to the A549 cell line were associated with oncogenes and tumor suppressor genes, such as additional repressive interactions on FOXO4 and promoter-promoter interactions between NF1 and RNF135. Knockout of an anchor associated with chromatin interactions reversed the dysregulation of cancer-related genes, suggesting that chromatin interactions are essential for proper expression of lung cancer-related genes. These findings demonstrate the 3D landscape and gene regulatory relationships of the lung cancer genome.</p>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":null,"pages":null},"PeriodicalIF":11.5,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10787015/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10615752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Prediction of the response of cancer patients to different treatments and identification of biomarkers of drug response are two major goals of individualized medicine. Here, we developed a deep learning framework called TINDL, completely trained on preclinical cancer cell lines (CCLs), to predict the response of cancer patients to different treatments. TINDL utilizes a tissue-informed normalization to account for the tissue type and cancer type of the tumors and to reduce the statistical discrepancies between CCLs and patient tumors. Moreover, by making the deep learning black box interpretable, this model identifies a small set of genes whose expression levels are predictive of drug response in the trained model, enabling identification of biomarkers of drug response. Using data from two large databases of CCLs and cancer tumors, we showed that this model can distinguish between sensitive and resistant tumors for 10 (out of 14) drugs, outperforming various other machine learning models. In addition, our small interfering RNA (siRNA) knockdown experiments on 10 genes identified by this model for one of the drugs (tamoxifen) confirmed that tamoxifen sensitivity is substantially influenced by all of these genes in MCF7 cells, and seven of these genes in T47D cells. Furthermore, genes implicated for multiple drugs pointed to shared mechanism of action among drugs and suggested several important signaling pathways. In summary, this study provides a powerful deep learning framework for prediction of drug response and identification of biomarkers of drug response in cancer. The code can be accessed at https://github.com/ddhostallero/tindl.
{"title":"Preclinical-to-clinical Anti-cancer Drug Response Prediction and Biomarker Identification Using TINDL.","authors":"David Earl Hostallero, Lixuan Wei, Liewei Wang, Junmei Cairns, Amin Emad","doi":"10.1016/j.gpb.2023.01.006","DOIUrl":"10.1016/j.gpb.2023.01.006","url":null,"abstract":"<p><p>Prediction of the response of cancer patients to different treatments and identification of biomarkers of drug response are two major goals of individualized medicine. Here, we developed a deep learning framework called TINDL, completely trained on preclinical cancer cell lines (CCLs), to predict the response of cancer patients to different treatments. TINDL utilizes a tissue-informed normalization to account for the tissue type and cancer type of the tumors and to reduce the statistical discrepancies between CCLs and patient tumors. Moreover, by making the deep learning black box interpretable, this model identifies a small set of genes whose expression levels are predictive of drug response in the trained model, enabling identification of biomarkers of drug response. Using data from two large databases of CCLs and cancer tumors, we showed that this model can distinguish between sensitive and resistant tumors for 10 (out of 14) drugs, outperforming various other machine learning models. In addition, our small interfering RNA (siRNA) knockdown experiments on 10 genes identified by this model for one of the drugs (tamoxifen) confirmed that tamoxifen sensitivity is substantially influenced by all of these genes in MCF7 cells, and seven of these genes in T47D cells. Furthermore, genes implicated for multiple drugs pointed to shared mechanism of action among drugs and suggested several important signaling pathways. In summary, this study provides a powerful deep learning framework for prediction of drug response and identification of biomarkers of drug response in cancer. The code can be accessed at https://github.com/ddhostallero/tindl.</p>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":null,"pages":null},"PeriodicalIF":11.5,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10787192/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10695748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-01DOI: 10.1016/j.gpb.2022.04.001
Guangsheng Pei , Fangfang Yan , Lukas M. Simon , Yulin Dai , Peilin Jia , Zhongming Zhao
Single-cell RNA sequencing (scRNA-seq) is revolutionizing the study of complex and dynamic cellular mechanisms. However, cell type annotation remains a main challenge as it largely relies on a priori knowledge and manual curation, which is cumbersome and subjective. The increasing number of scRNA-seq datasets, as well as numerous published genetic studies, has motivated us to build a comprehensive human cell type reference atlas. Here, we present decoding Cell type Specificity (deCS), an automatic cell type annotation method augmented by a comprehensive collection of human cell type expression profiles and marker genes. We used deCS to annotate scRNA-seq data from various tissue types and systematically evaluated the annotation accuracy under different conditions, including reference panels, sequencing depth, and feature selection strategies. Our results demonstrate that expanding the references is critical for improving annotation accuracy. Compared to many existing state-of-the-art annotation tools, deCS significantly reduced computation time and increased accuracy. deCS can be integrated into the standard scRNA-seq analytical pipeline to enhance cell type annotation. Finally, we demonstrated the broad utility of deCS to identify trait–cell type associations in 51 human complex traits, providing deep insights into the cellular mechanisms underlying disease pathogenesis. All documents for deCS, including source code, user manual, demo data, and tutorials, are freely available at https://github.com/bsml320/deCS.
{"title":"deCS: A Tool for Systematic Cell Type Annotations of Single-cell RNA Sequencing Data among Human Tissues","authors":"Guangsheng Pei , Fangfang Yan , Lukas M. Simon , Yulin Dai , Peilin Jia , Zhongming Zhao","doi":"10.1016/j.gpb.2022.04.001","DOIUrl":"10.1016/j.gpb.2022.04.001","url":null,"abstract":"<div><p>Single-cell RNA sequencing (<strong>scRNA-seq</strong>) is revolutionizing the study of complex and dynamic cellular mechanisms. However, cell type annotation remains a main challenge as it largely relies on <em>a priori</em> knowledge and manual curation, which is cumbersome and subjective. The increasing number of scRNA-seq datasets, as well as numerous published genetic studies, has motivated us to build a comprehensive human cell type reference atlas.<!--> <!-->Here, we present decoding Cell type Specificity (<em>deCS</em>), an automatic <strong>cell type annotation</strong> method augmented by a comprehensive collection of human cell type expression profiles and marker genes. We used <em>deCS</em> to annotate scRNA-seq data from various tissue types and systematically evaluated the annotation accuracy under different conditions, including reference panels, sequencing depth, and feature selection strategies. Our results demonstrate that expanding the references is critical for improving annotation accuracy. Compared to many existing state-of-the-art annotation tools, <em>deCS</em> significantly reduced computation time and increased accuracy. <em>deCS</em> can be integrated into the standard scRNA-seq analytical pipeline to enhance cell type annotation. Finally, we demonstrated the broad utility of <em>deCS</em> to identify <strong>trait–cell type associations</strong> in 51 human complex traits, providing deep insights into the cellular mechanisms underlying disease pathogenesis. All documents for <em>deCS</em>, including source code, user manual, demo data, and tutorials, are freely available at <span>https://github.com/bsml320/deCS</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":null,"pages":null},"PeriodicalIF":9.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10059212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-01DOI: 10.1016/j.gpb.2023.02.005
Ye Chen , Yuyan Wang , Ping Zhou , Hao Huang , Rui Li , Zhen Zeng , Zifeng Cui , Rui Tian , Zhuang Jin , Jiashuo Liu , Zhaoyue Huang , Lifang Li , Zheying Huang , Xun Tian , Meiying Yu , Zheng Hu
Integration of oncogenic DNA viruses into the human genome is a key step in most virus-induced carcinogenesis. Here, we constructed a virus integration site (VIS) Atlas database, an extensive collection of integration breakpoints for three most prevalent oncoviruses, human papillomavirus, hepatitis B virus, and Epstein–Barr virus based on the next-generation sequencing (NGS) data, literature, and experimental data. There are 63,179 breakpoints and 47,411 junctional sequences with full annotations deposited in the VIS Atlas database, comprising 47 virus genotypes and 17 disease types. The VIS Atlas database provides (1) a genome browser for NGS breakpoint quality check, visualization of VISs, and the local genomic context; (2) a novel platform to discover integration patterns; and (3) a statistics interface for a comprehensive investigation of genotype-specific integration features. Data collected in the VIS Atlas aid to provide insights into virus pathogenic mechanisms and the development of novel antitumor drugs. The VIS Atlas database is available at https://www.vis-atlas.tech/.
{"title":"VIS Atlas: A Database of Virus Integration Sites in Human Genome from NGS Data to Explore Integration Patterns","authors":"Ye Chen , Yuyan Wang , Ping Zhou , Hao Huang , Rui Li , Zhen Zeng , Zifeng Cui , Rui Tian , Zhuang Jin , Jiashuo Liu , Zhaoyue Huang , Lifang Li , Zheying Huang , Xun Tian , Meiying Yu , Zheng Hu","doi":"10.1016/j.gpb.2023.02.005","DOIUrl":"10.1016/j.gpb.2023.02.005","url":null,"abstract":"<div><p>Integration of oncogenic <strong>DNA viruses</strong> into the human genome is a key step in most virus-induced carcinogenesis. Here, we constructed a <strong>virus integration site</strong> (VIS) Atlas database, an extensive collection of integration breakpoints for three most prevalent oncoviruses, human papillomavirus, hepatitis B virus, and Epstein–Barr virus based on the <strong>next-generation sequencing</strong> (NGS) data, literature, and experimental data. There are 63,179 breakpoints and 47,411 junctional sequences with full annotations deposited in the VIS Atlas database, comprising 47 <strong>virus genotypes</strong> and 17 disease types. The VIS Atlas database provides (1) a genome browser for NGS breakpoint quality check, visualization of VISs, and the local genomic context; (2) a novel platform to discover <strong>integration patterns</strong>; and (3) a statistics interface for a comprehensive investigation of genotype-specific integration features. Data collected in the VIS Atlas aid to provide insights into virus pathogenic mechanisms and the development of novel antitumor drugs. The VIS Atlas database is available at <span>https://www.vis-atlas.tech/</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":null,"pages":null},"PeriodicalIF":9.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10149144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-01DOI: 10.1016/j.gpb.2022.04.007
Zhongqiu Li , Yiheng Hu , Xuelian Ma , Lingling Da , Jiajie She , Yue Liu , Xin Yi , Yaxin Cao , Wenying Xu , Yuannian Jiao , Zhen Su
Genetic and epigenetic changes after polyploidization events could result in variable gene expression and modified regulatory networks. Here, using large-scale transcriptome data, we constructed co-expression networks for diploid, tetraploid, and hexaploid wheat species, and built a platform for comparing co-expression networks of allohexaploid wheat and its progenitors, named WheatCENet. WheatCENet is a platform for searching and comparing specific functional co-expression networks, as well as identifying the related functions of the genes clustered therein. Functional annotations like pathways, gene families, protein–protein interactions, microRNAs (miRNAs), and several lines of epigenome data are integrated into this platform, and Gene Ontology (GO) annotation, gene set enrichment analysis (GSEA), motif identification, and other useful tools are also included. Using WheatCENet, we found that the network of WHEAT ABERRANT PANICLE ORGANIZATION 1 (WAPO1) has more co-expressed genes related to spike development in hexaploid wheat than its progenitors. We also found a novel motif of CCWWWWWWGG (CArG) specifically in the promoter region of WAPO-A1, suggesting that neofunctionalization of the WAPO-A1 gene affects spikelet development in hexaploid wheat. WheatCENet is useful for investigating co-expression networks and conducting other analyses, and thus facilitates comparative and functional genomic studies in wheat. WheatCENet is freely available at http://bioinformatics.cpolar.cn/WheatCENet and http://bioinformatics.cau.edu.cn/WheatCENet.
{"title":"WheatCENet: A Database for Comparative Co-expression Networks Analysis of Allohexaploid Wheat and Its Progenitors","authors":"Zhongqiu Li , Yiheng Hu , Xuelian Ma , Lingling Da , Jiajie She , Yue Liu , Xin Yi , Yaxin Cao , Wenying Xu , Yuannian Jiao , Zhen Su","doi":"10.1016/j.gpb.2022.04.007","DOIUrl":"10.1016/j.gpb.2022.04.007","url":null,"abstract":"<div><p>Genetic and epigenetic changes after polyploidization events could result in variable gene expression and modified regulatory networks. Here, using large-scale transcriptome data, we constructed <strong>co-expression networks</strong> for diploid, tetraploid, and hexaploid wheat species, and built a platform for comparing co-expression networks of allohexaploid wheat and its progenitors, named WheatCENet. WheatCENet is a platform for searching and comparing specific functional co-expression networks, as well as identifying the related functions of the genes clustered therein. <strong>Functional annotations</strong> like pathways, gene families, protein–protein interactions, microRNAs (miRNAs), and several lines of epigenome data are integrated into this platform, and Gene Ontology (GO) annotation, gene set enrichment analysis (GSEA), motif identification, and other useful tools are also included. Using WheatCENet, we found that the network of <em>WHEAT ABERRANT PANICLE ORGANIZATION 1</em> (<em>WAPO1</em>) has more co-expressed genes related to spike development in hexaploid wheat than its progenitors. We also found a novel motif of CCWWWWWWGG (CArG) specifically in the promoter region of <em>WAPO-A1</em>, suggesting that neofunctionalization of the <em>WAPO-A1</em> gene affects spikelet development in hexaploid wheat. WheatCENet is useful for investigating co-expression networks and conducting other analyses, and thus facilitates comparative and functional genomic studies in wheat. WheatCENet is freely available at <span>http://bioinformatics.cpolar.cn/WheatCENet</span><svg><path></path></svg> and <span>http://bioinformatics.cau.edu.cn/WheatCENet</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":null,"pages":null},"PeriodicalIF":9.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9779875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}