Messenger RNA splicing and degradation are critical for gene expression regulation, the abnormality of which leads to diseases. Previous methods for estimating kinetic rates have limitations, assuming uniform rates across cells. DeepKINET is a deep generative model that estimates splicing and degradation rates at single-cell resolution from scRNA-seq data. DeepKINET outperforms existing methods on simulated and metabolic labeling datasets. Applied to forebrain and breast cancer data, it identifies RNA-binding proteins responsible for kinetic rate diversity. DeepKINET also analyzes the effects of splicing factor mutations on target genes in erythroid lineage cells. DeepKINET effectively reveals cellular heterogeneity in post-transcriptional regulation.
{"title":"DeepKINET: a deep generative model for estimating single-cell RNA splicing and degradation rates.","authors":"Chikara Mizukoshi, Yasuhiro Kojima, Satoshi Nomura, Shuto Hayashi, Ko Abe, Teppei Shimamura","doi":"10.1186/s13059-024-03367-8","DOIUrl":"10.1186/s13059-024-03367-8","url":null,"abstract":"<p><p>Messenger RNA splicing and degradation are critical for gene expression regulation, the abnormality of which leads to diseases. Previous methods for estimating kinetic rates have limitations, assuming uniform rates across cells. DeepKINET is a deep generative model that estimates splicing and degradation rates at single-cell resolution from scRNA-seq data. DeepKINET outperforms existing methods on simulated and metabolic labeling datasets. Applied to forebrain and breast cancer data, it identifies RNA-binding proteins responsible for kinetic rate diversity. DeepKINET also analyzes the effects of splicing factor mutations on target genes in erythroid lineage cells. DeepKINET effectively reveals cellular heterogeneity in post-transcriptional regulation.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11378460/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142141562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-26DOI: 10.1186/s13059-024-03371-y
Agustín Amalfitano, Nicolás Stocchi, Hugo Marcelo Atencio, Fernando Villarreal, Arjen Ten Have
Seqrutinator is an objective, flexible pipeline that removes sequences with sequencing and/or gene model errors and sequences from pseudogenes from complex, eukaryotic protein superfamilies. Testing Seqrutinator on major superfamilies BAHD, CYP, and UGT removes only 1.94% of SwissProt entries, 14% of entries from the model plant Arabidopsis thaliana, but 80% of entries from Pinus taeda's recent complete proteome. Application of Seqrutinator on crude BAHDomes, CYPomes, and UGTomes obtained from 16 plant proteomes shows convergence of the numbers of paralogues. MSAs, phylogenies, and particularly functional clustering improve drastically upon Seqrutinator application, indicating good performance.
{"title":"Seqrutinator: scrutiny of large protein superfamily sequence datasets for the identification and elimination of non-functional homologues.","authors":"Agustín Amalfitano, Nicolás Stocchi, Hugo Marcelo Atencio, Fernando Villarreal, Arjen Ten Have","doi":"10.1186/s13059-024-03371-y","DOIUrl":"10.1186/s13059-024-03371-y","url":null,"abstract":"<p><p>Seqrutinator is an objective, flexible pipeline that removes sequences with sequencing and/or gene model errors and sequences from pseudogenes from complex, eukaryotic protein superfamilies. Testing Seqrutinator on major superfamilies BAHD, CYP, and UGT removes only 1.94% of SwissProt entries, 14% of entries from the model plant Arabidopsis thaliana, but 80% of entries from Pinus taeda's recent complete proteome. Application of Seqrutinator on crude BAHDomes, CYPomes, and UGTomes obtained from 16 plant proteomes shows convergence of the numbers of paralogues. MSAs, phylogenies, and particularly functional clustering improve drastically upon Seqrutinator application, indicating good performance.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11346255/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142074283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-15DOI: 10.1186/s13059-024-03307-6
Wen-Jou Chang, Maria S Baker, Eleonora Laritsky, Chathura J Gunasekara, Uditha Maduranga, Justine C Galliou, Joseph W McFadden, Jessica R Waltemyer, Bruce Berggren-Thomas, Brianna N Tate, Hanxue Zhang, Benjamin D Rosen, Curtis P Van Tassell, George E Liu, Cristian Coarfa, Yi Athena Ren, Robert A Waterland
Background: We recently identified ~ 10,000 correlated regions of systemic interindividual epigenetic variation (CoRSIVs) in the human genome. These methylation variants are amenable to population studies, as DNA methylation measurements in blood provide information on epigenetic regulation throughout the body. Moreover, establishment of DNA methylation at human CoRSIVs is labile to periconceptional influences such as nutrition. Here, we analyze publicly available whole-genome bisulfite sequencing data on multiple tissues of each of two Holstein cows to determine whether CoRSIVs exist in cattle.
Results: Focusing on genomic blocks with ≥ 5 CpGs and a systemic interindividual variation index of at least 20, our approach identifies 217 cattle CoRSIVs, a subset of which we independently validate by bisulfite pyrosequencing. Similar to human CoRSIVs, those in cattle are strongly associated with genetic variation. Also as in humans, we show that establishment of DNA methylation at cattle CoRSIVs is particularly sensitive to early embryonic environment, in the context of embryo culture during assisted reproduction.
Conclusions: Our data indicate that CoRSIVs exist in cattle, as in humans, suggesting these systemic epigenetic variants may be common to mammals in general. To the extent that individual epigenetic variation at cattle CoRSIVs affects phenotypic outcomes, assessment of CoRSIV methylation at birth may become an important tool for optimizing agriculturally important traits. Moreover, adjusting embryo culture conditions during assisted reproduction may provide opportunities to tailor agricultural outcomes by engineering CoRSIV methylation profiles.
{"title":"Systemic interindividual DNA methylation variants in cattle share major hallmarks with those in humans.","authors":"Wen-Jou Chang, Maria S Baker, Eleonora Laritsky, Chathura J Gunasekara, Uditha Maduranga, Justine C Galliou, Joseph W McFadden, Jessica R Waltemyer, Bruce Berggren-Thomas, Brianna N Tate, Hanxue Zhang, Benjamin D Rosen, Curtis P Van Tassell, George E Liu, Cristian Coarfa, Yi Athena Ren, Robert A Waterland","doi":"10.1186/s13059-024-03307-6","DOIUrl":"10.1186/s13059-024-03307-6","url":null,"abstract":"<p><strong>Background: </strong>We recently identified ~ 10,000 correlated regions of systemic interindividual epigenetic variation (CoRSIVs) in the human genome. These methylation variants are amenable to population studies, as DNA methylation measurements in blood provide information on epigenetic regulation throughout the body. Moreover, establishment of DNA methylation at human CoRSIVs is labile to periconceptional influences such as nutrition. Here, we analyze publicly available whole-genome bisulfite sequencing data on multiple tissues of each of two Holstein cows to determine whether CoRSIVs exist in cattle.</p><p><strong>Results: </strong>Focusing on genomic blocks with ≥ 5 CpGs and a systemic interindividual variation index of at least 20, our approach identifies 217 cattle CoRSIVs, a subset of which we independently validate by bisulfite pyrosequencing. Similar to human CoRSIVs, those in cattle are strongly associated with genetic variation. Also as in humans, we show that establishment of DNA methylation at cattle CoRSIVs is particularly sensitive to early embryonic environment, in the context of embryo culture during assisted reproduction.</p><p><strong>Conclusions: </strong>Our data indicate that CoRSIVs exist in cattle, as in humans, suggesting these systemic epigenetic variants may be common to mammals in general. To the extent that individual epigenetic variation at cattle CoRSIVs affects phenotypic outcomes, assessment of CoRSIV methylation at birth may become an important tool for optimizing agriculturally important traits. Moreover, adjusting embryo culture conditions during assisted reproduction may provide opportunities to tailor agricultural outcomes by engineering CoRSIV methylation profiles.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247883/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141617438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pathogenic allele silencing is a promising treatment for genetic hereditary diseases. Here, we develop an RNA-cleaving tool, TaqTth-hpRNA, consisting of a small, chimeric TaqTth, and a hairpin RNA guiding probe. With a minimal flanking sequence-motif requirement, in vitro and in vivo studies show TaqTth-hpRNA cleaves RNA efficiently and specifically. In an Alzheimer's disease model, we demonstrate silencing of mutant APPswe mRNA without altering the wild-type APP mRNA. Notably, due to the compact size of TaqTth, we are able to combine with APOE2 overexpression in a single AAV vector, which results in stronger inhibition of pathologies.
{"title":"TaqTth-hpRNA: a novel compact RNA-targeting tool for specific silencing of pathogenic mRNA.","authors":"Chong Xu, Jiyanuo Cao, Huanran Qiang, Yu Liu, Jialin Wu, Qiudan Luo, Meng Wan, Yujie Wang, Peiliang Wang, Qian Cheng, Guohua Zhou, Jian Sima, Yongjian Guo, Shu Xu","doi":"10.1186/s13059-024-03326-3","DOIUrl":"10.1186/s13059-024-03326-3","url":null,"abstract":"<p><p>Pathogenic allele silencing is a promising treatment for genetic hereditary diseases. Here, we develop an RNA-cleaving tool, TaqTth-hpRNA, consisting of a small, chimeric TaqTth, and a hairpin RNA guiding probe. With a minimal flanking sequence-motif requirement, in vitro and in vivo studies show TaqTth-hpRNA cleaves RNA efficiently and specifically. In an Alzheimer's disease model, we demonstrate silencing of mutant APP<sup>swe</sup> mRNA without altering the wild-type APP mRNA. Notably, due to the compact size of TaqTth, we are able to combine with APOE2 overexpression in a single AAV vector, which results in stronger inhibition of pathologies.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11229350/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141555787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: The massive structural variations and frequent introgression highly contribute to the genetic diversity of wheat, while the huge and complex genome of polyploid wheat hinders efficient genotyping of abundant varieties towards accurate identification, management, and exploitation of germplasm resources.
Results: We develop a novel workflow that identifies 1240 high-quality large copy number variation blocks (CNVb) in wheat at the pan-genome level, demonstrating that CNVb can serve as an ideal DNA fingerprinting marker for discriminating massive varieties, with the accuracy validated by PCR assay. We then construct a digitalized genotyping CNVb map across 1599 global wheat accessions. Key CNVb markers are linked with trait-associated introgressions, such as the 1RS·1BL translocation and 2NvS translocation, and the beneficial alleles, such as the end-use quality allele Glu-D1d (Dx5 + Dy10) and the semi-dwarf r-e-z allele. Furthermore, we demonstrate that these tagged CNVb markers promote a stable and cost-effective strategy for evaluating wheat germplasm resources with ultra-low-coverage sequencing data, competing with SNP array for applications such as evaluating new varieties, efficient management of collections in gene banks, and describing wheat germplasm resources in a digitalized manner. We also develop a user-friendly interactive platform, WheatCNVb ( http://wheat.cau.edu.cn/WheatCNVb/ ), for exploring the CNVb profiles over ever-increasing wheat accessions, and also propose a QR-code-like representation of individual digital CNVb fingerprint. This platform also allows uploading new CNVb profiles for comparison with stored varieties.
Conclusions: The CNVb-based approach provides a low-cost and high-throughput genotyping strategy for enabling digitalized wheat germplasm management and modern breeding with precise and practical decision-making.
{"title":"Tagging large CNV blocks in wheat boosts digitalization of germplasm resources by ultra-low-coverage sequencing.","authors":"Jianxia Niu, Wenxi Wang, Zihao Wang, Zhe Chen, Xiaoyu Zhang, Zhen Qin, Lingfeng Miao, Zhengzhao Yang, Chaojie Xie, Mingming Xin, Huiru Peng, Yingyin Yao, Jie Liu, Zhongfu Ni, Qixin Sun, Weilong Guo","doi":"10.1186/s13059-024-03315-6","DOIUrl":"10.1186/s13059-024-03315-6","url":null,"abstract":"<p><strong>Background: </strong>The massive structural variations and frequent introgression highly contribute to the genetic diversity of wheat, while the huge and complex genome of polyploid wheat hinders efficient genotyping of abundant varieties towards accurate identification, management, and exploitation of germplasm resources.</p><p><strong>Results: </strong>We develop a novel workflow that identifies 1240 high-quality large copy number variation blocks (CNVb) in wheat at the pan-genome level, demonstrating that CNVb can serve as an ideal DNA fingerprinting marker for discriminating massive varieties, with the accuracy validated by PCR assay. We then construct a digitalized genotyping CNVb map across 1599 global wheat accessions. Key CNVb markers are linked with trait-associated introgressions, such as the 1RS·1BL translocation and 2N<sup>v</sup>S translocation, and the beneficial alleles, such as the end-use quality allele Glu-D1d (Dx5 + Dy10) and the semi-dwarf r-e-z allele. Furthermore, we demonstrate that these tagged CNVb markers promote a stable and cost-effective strategy for evaluating wheat germplasm resources with ultra-low-coverage sequencing data, competing with SNP array for applications such as evaluating new varieties, efficient management of collections in gene banks, and describing wheat germplasm resources in a digitalized manner. We also develop a user-friendly interactive platform, WheatCNVb ( http://wheat.cau.edu.cn/WheatCNVb/ ), for exploring the CNVb profiles over ever-increasing wheat accessions, and also propose a QR-code-like representation of individual digital CNVb fingerprint. This platform also allows uploading new CNVb profiles for comparison with stored varieties.</p><p><strong>Conclusions: </strong>The CNVb-based approach provides a low-cost and high-throughput genotyping strategy for enabling digitalized wheat germplasm management and modern breeding with precise and practical decision-making.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11218387/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141477776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1186/s13059-024-03312-9
Kevin Lamkiewicz, Lisa-Marie Barf, Konrad Sachse, Martin Hölzer
Microbial pangenome analysis identifies present or absent genes in prokaryotic genomes. However, current tools are limited when analyzing species with higher sequence diversity or higher taxonomic orders such as genera or families. The Roary ILP Bacterial core Annotation Pipeline (RIBAP) uses an integer linear programming approach to refine gene clusters predicted by Roary for identifying core genes. RIBAP successfully handles the complexity and diversity of Chlamydia, Klebsiella, Brucella, and Enterococcus genomes, outperforming other established and recent pangenome tools for identifying all-encompassing core genes at the genus level. RIBAP is a freely available Nextflow pipeline at github.com/hoelzer-lab/ribap and zenodo.org/doi/10.5281/zenodo.10890871.
{"title":"RIBAP: a comprehensive bacterial core genome annotation pipeline for pangenome calculation beyond the species level.","authors":"Kevin Lamkiewicz, Lisa-Marie Barf, Konrad Sachse, Martin Hölzer","doi":"10.1186/s13059-024-03312-9","DOIUrl":"10.1186/s13059-024-03312-9","url":null,"abstract":"<p><p>Microbial pangenome analysis identifies present or absent genes in prokaryotic genomes. However, current tools are limited when analyzing species with higher sequence diversity or higher taxonomic orders such as genera or families. The Roary ILP Bacterial core Annotation Pipeline (RIBAP) uses an integer linear programming approach to refine gene clusters predicted by Roary for identifying core genes. RIBAP successfully handles the complexity and diversity of Chlamydia, Klebsiella, Brucella, and Enterococcus genomes, outperforming other established and recent pangenome tools for identifying all-encompassing core genes at the genus level. RIBAP is a freely available Nextflow pipeline at github.com/hoelzer-lab/ribap and zenodo.org/doi/10.5281/zenodo.10890871.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11218241/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141477775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1186/s13059-024-03314-7
Daniel R Tabet, Da Kuang, Megan C Lancaster, Roujia Li, Karen Liu, Jochen Weile, Atina G Coté, Yingzhou Wu, Robert A Hegele, Dan M Roden, Frederick P Roth
Background: Computational variant effect predictors offer a scalable and increasingly reliable means of interpreting human genetic variation, but concerns of circularity and bias have limited previous methods for evaluating and comparing predictors. Population-level cohorts of genotyped and phenotyped participants that have not been used in predictor training can facilitate an unbiased benchmarking of available methods. Using a curated set of human gene-trait associations with a reported rare-variant burden association, we evaluate the correlations of 24 computational variant effect predictors with associated human traits in the UK Biobank and All of Us cohorts.
Results: AlphaMissense outperformed all other predictors in inferring human traits based on rare missense variants in UK Biobank and All of Us participants. The overall rankings of computational variant effect predictors in these two cohorts showed a significant positive correlation.
Conclusion: We describe a method to assess computational variant effect predictors that sidesteps the limitations of previous evaluations. This approach is generalizable to future predictors and could continue to inform predictor choice for personal and clinical genetics.
{"title":"Benchmarking computational variant effect predictors by their ability to infer human traits.","authors":"Daniel R Tabet, Da Kuang, Megan C Lancaster, Roujia Li, Karen Liu, Jochen Weile, Atina G Coté, Yingzhou Wu, Robert A Hegele, Dan M Roden, Frederick P Roth","doi":"10.1186/s13059-024-03314-7","DOIUrl":"10.1186/s13059-024-03314-7","url":null,"abstract":"<p><strong>Background: </strong>Computational variant effect predictors offer a scalable and increasingly reliable means of interpreting human genetic variation, but concerns of circularity and bias have limited previous methods for evaluating and comparing predictors. Population-level cohorts of genotyped and phenotyped participants that have not been used in predictor training can facilitate an unbiased benchmarking of available methods. Using a curated set of human gene-trait associations with a reported rare-variant burden association, we evaluate the correlations of 24 computational variant effect predictors with associated human traits in the UK Biobank and All of Us cohorts.</p><p><strong>Results: </strong>AlphaMissense outperformed all other predictors in inferring human traits based on rare missense variants in UK Biobank and All of Us participants. The overall rankings of computational variant effect predictors in these two cohorts showed a significant positive correlation.</p><p><strong>Conclusion: </strong>We describe a method to assess computational variant effect predictors that sidesteps the limitations of previous evaluations. This approach is generalizable to future predictors and could continue to inform predictor choice for personal and clinical genetics.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11218265/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141477774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-31DOI: 10.1186/s13059-024-03175-0
Elena M Pugacheva, Dharmendra Nath Bhatt, Samuel Rivero-Hinojosa, Md Tajmul, Liron Fedida, Emma Price, Yon Ji, Dmitri Loukinov, Alexander V Strunnikov, Bing Ren, Victor V Lobanenkov
Background: Pervasive usage of alternative promoters leads to the deregulation of gene expression in carcinogenesis and may drive the emergence of new genes in spermatogenesis. However, little is known regarding the mechanisms underpinning the activation of alternative promoters.
Results: Here we describe how alternative cancer-testis-specific transcription is activated. We show that intergenic and intronic CTCF binding sites, which are transcriptionally inert in normal somatic cells, could be epigenetically reprogrammed into active de novo promoters in germ and cancer cells. BORIS/CTCFL, the testis-specific paralog of the ubiquitously expressed CTCF, triggers the epigenetic reprogramming of CTCF sites into units of active transcription. BORIS binding initiates the recruitment of the chromatin remodeling factor, SRCAP, followed by the replacement of H2A histone with H2A.Z, resulting in a more relaxed chromatin state in the nucleosomes flanking the CTCF binding sites. The relaxation of chromatin around CTCF binding sites facilitates the recruitment of multiple additional transcription factors, thereby activating transcription from a given binding site. We demonstrate that the epigenetically reprogrammed CTCF binding sites can drive the expression of cancer-testis genes, long noncoding RNAs, retro-pseudogenes, and dormant transposable elements.
Conclusions: Thus, BORIS functions as a transcription factor that epigenetically reprograms clustered CTCF binding sites into transcriptional start sites, promoting transcription from alternative promoters in both germ cells and cancer cells.
{"title":"BORIS/CTCFL epigenetically reprograms clustered CTCF binding sites into alternative transcriptional start sites.","authors":"Elena M Pugacheva, Dharmendra Nath Bhatt, Samuel Rivero-Hinojosa, Md Tajmul, Liron Fedida, Emma Price, Yon Ji, Dmitri Loukinov, Alexander V Strunnikov, Bing Ren, Victor V Lobanenkov","doi":"10.1186/s13059-024-03175-0","DOIUrl":"10.1186/s13059-024-03175-0","url":null,"abstract":"<p><strong>Background: </strong>Pervasive usage of alternative promoters leads to the deregulation of gene expression in carcinogenesis and may drive the emergence of new genes in spermatogenesis. However, little is known regarding the mechanisms underpinning the activation of alternative promoters.</p><p><strong>Results: </strong>Here we describe how alternative cancer-testis-specific transcription is activated. We show that intergenic and intronic CTCF binding sites, which are transcriptionally inert in normal somatic cells, could be epigenetically reprogrammed into active de novo promoters in germ and cancer cells. BORIS/CTCFL, the testis-specific paralog of the ubiquitously expressed CTCF, triggers the epigenetic reprogramming of CTCF sites into units of active transcription. BORIS binding initiates the recruitment of the chromatin remodeling factor, SRCAP, followed by the replacement of H2A histone with H2A.Z, resulting in a more relaxed chromatin state in the nucleosomes flanking the CTCF binding sites. The relaxation of chromatin around CTCF binding sites facilitates the recruitment of multiple additional transcription factors, thereby activating transcription from a given binding site. We demonstrate that the epigenetically reprogrammed CTCF binding sites can drive the expression of cancer-testis genes, long noncoding RNAs, retro-pseudogenes, and dormant transposable elements.</p><p><strong>Conclusions: </strong>Thus, BORIS functions as a transcription factor that epigenetically reprograms clustered CTCF binding sites into transcriptional start sites, promoting transcription from alternative promoters in both germ cells and cancer cells.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10832218/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139651980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-31DOI: 10.1186/s13059-024-03171-4
Sarah Fazal, Matt C Danzi, Isaac Xu, Shilpa Nadimpalli Kobren, Shamil Sunyaev, Chloe Reuter, Shruti Marwaha, Matthew Wheeler, Egor Dolzhenko, Francesca Lucas, Stefan Wuchty, Mustafa Tekin, Stephan Züchner, Vanessa Aguiar-Pulido
Expansions of tandem repeats (TRs) cause approximately 60 monogenic diseases. We expect that the discovery of additional pathogenic repeat expansions will narrow the diagnostic gap in many diseases. A growing number of TR expansions are being identified, and interpreting them is a challenge. We present RExPRT (Repeat EXpansion Pathogenicity pRediction Tool), a machine learning tool for distinguishing pathogenic from benign TR expansions. Our results demonstrate that an ensemble approach classifies TRs with an average precision of 93% and recall of 83%. RExPRT's high precision will be valuable in large-scale discovery studies, which require prioritization of candidate loci for follow-up studies.
{"title":"RExPRT: a machine learning tool to predict pathogenicity of tandem repeat loci.","authors":"Sarah Fazal, Matt C Danzi, Isaac Xu, Shilpa Nadimpalli Kobren, Shamil Sunyaev, Chloe Reuter, Shruti Marwaha, Matthew Wheeler, Egor Dolzhenko, Francesca Lucas, Stefan Wuchty, Mustafa Tekin, Stephan Züchner, Vanessa Aguiar-Pulido","doi":"10.1186/s13059-024-03171-4","DOIUrl":"10.1186/s13059-024-03171-4","url":null,"abstract":"<p><p>Expansions of tandem repeats (TRs) cause approximately 60 monogenic diseases. We expect that the discovery of additional pathogenic repeat expansions will narrow the diagnostic gap in many diseases. A growing number of TR expansions are being identified, and interpreting them is a challenge. We present RExPRT (Repeat EXpansion Pathogenicity pRediction Tool), a machine learning tool for distinguishing pathogenic from benign TR expansions. Our results demonstrate that an ensemble approach classifies TRs with an average precision of 93% and recall of 83%. RExPRT's high precision will be valuable in large-scale discovery studies, which require prioritization of candidate loci for follow-up studies.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10832122/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139651982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-31DOI: 10.1186/s13059-024-03170-5
Alice Antonello, Riccardo Bergamin, Nicola Calonaci, Jacob Househam, Salvatore Milite, Marc J Williams, Fabio Anselmi, Alberto d'Onofrio, Vasavi Sundaram, Alona Sosinsky, William C H Cross, Giulio Caravagna
Copy number alterations (CNAs) are among the most important genetic events in cancer, but their detection from sequencing data is challenging because of unknown sample purity, tumor ploidy, and general intra-tumor heterogeneity. Here, we present CNAqc, an evolution-inspired method to perform the computational validation of clonal and subclonal CNAs detected from bulk DNA sequencing. CNAqc is validated using single-cell data and simulations, is applied to over 4000 TCGA and PCAWG samples, and is incorporated into the validation process for the clinically accredited bioinformatics pipeline at Genomics England. CNAqc is designed to support automated quality control procedures for tumor somatic data validation.
{"title":"Computational validation of clonal and subclonal copy number alterations from bulk tumor sequencing using CNAqc.","authors":"Alice Antonello, Riccardo Bergamin, Nicola Calonaci, Jacob Househam, Salvatore Milite, Marc J Williams, Fabio Anselmi, Alberto d'Onofrio, Vasavi Sundaram, Alona Sosinsky, William C H Cross, Giulio Caravagna","doi":"10.1186/s13059-024-03170-5","DOIUrl":"10.1186/s13059-024-03170-5","url":null,"abstract":"<p><p>Copy number alterations (CNAs) are among the most important genetic events in cancer, but their detection from sequencing data is challenging because of unknown sample purity, tumor ploidy, and general intra-tumor heterogeneity. Here, we present CNAqc, an evolution-inspired method to perform the computational validation of clonal and subclonal CNAs detected from bulk DNA sequencing. CNAqc is validated using single-cell data and simulations, is applied to over 4000 TCGA and PCAWG samples, and is incorporated into the validation process for the clinically accredited bioinformatics pipeline at Genomics England. CNAqc is designed to support automated quality control procedures for tumor somatic data validation.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10832148/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139651981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}