Pub Date : 2024-11-14DOI: 10.1186/s13059-024-03432-2
Tianyu Yuan, Hao Yan, Kevin C Li, Ivan Surovtsev, Megan C King, Simon G J Mochrie
Background: Inhomogeneous patterns of chromatin-chromatin contacts within 10-100-kb-sized regions of the genome are a generic feature of chromatin spatial organization. These features, termed topologically associating domains (TADs), have led to the loop extrusion factor (LEF) model. Currently, our ability to model TADs relies on the observation that in vertebrates TAD boundaries are correlated with DNA sequences that bind CTCF, which therefore is inferred to block loop extrusion. However, although TADs feature prominently in their Hi-C maps, non-vertebrate eukaryotes either do not express CTCF or show few TAD boundaries that correlate with CTCF sites. In all of these organisms, the counterparts of CTCF remain unknown, frustrating comparisons between Hi-C data and simulations.
Results: To extend the LEF model across the tree of life, here, we propose the conserved-current loop extrusion (CCLE) model that interprets loop-extruding cohesin as a nearly conserved probability current. From cohesin ChIP-seq data alone, we derive a position-dependent loop extrusion rate, allowing for a modified paradigm for loop extrusion, that goes beyond solely localized barriers to also include loop extrusion rates that vary continuously. We show that CCLE accurately predicts the TAD-scale Hi-C maps of interphase Schizosaccharomyces pombe, as well as those of meiotic and mitotic Saccharomyces cerevisiae, demonstrating its utility in organisms lacking CTCF.
Conclusions: The success of CCLE in yeasts suggests that loop extrusion by cohesin is indeed the primary mechanism underlying TADs in these systems. CCLE allows us to obtain loop extrusion parameters such as the LEF density and processivity, which compare well to independent estimates.
背景:基因组中 10-100 kb 大小区域内染色质与染色质接触的不均匀模式是染色质空间组织的一般特征。这些特征被称为拓扑关联域(TADs),并由此产生了环挤出因子(LEF)模型。目前,我们建立 TADs 模型的能力依赖于观察到的现象,即在脊椎动物中,TAD 的边界与结合 CTCF 的 DNA 序列相关,因此推断 CTCF 会阻止环挤压。然而,尽管 TAD 在 Hi-C 图谱中具有显著特征,但非脊椎动物真核生物要么不表达 CTCF,要么很少显示与 CTCF 位点相关的 TAD 边界。在所有这些生物中,CTCF 的对应物仍然未知,这使得 Hi-C 数据与模拟结果之间的比较变得困难:为了将 LEF 模型扩展到整个生命树,我们在此提出了保守电流环挤出(CCLE)模型,该模型将环挤出的凝聚蛋白解释为几乎保守的概率电流。仅从凝聚素 ChIP-seq 数据中,我们就得出了与位置相关的环路挤出率,从而修正了环路挤出的范式,超越了单纯的局部障碍,也包括了连续变化的环路挤出率。我们的研究表明,CCLE 准确预测了间期酵母的 TAD 尺度 Hi-C 图谱以及减数分裂和有丝分裂酵母的 TAD 尺度 Hi-C 图谱,证明了它在缺乏 CTCF 的生物体中的实用性:结论:CCLE 在酵母中的成功应用表明,在这些系统中,凝聚素的环挤压确实是 TAD 的主要机制。CCLE使我们能够获得环挤压参数,如LEF密度和加工率,这些参数与独立的估计值比较接近。
{"title":"Cohesin distribution alone predicts chromatin organization in yeast via conserved-current loop extrusion.","authors":"Tianyu Yuan, Hao Yan, Kevin C Li, Ivan Surovtsev, Megan C King, Simon G J Mochrie","doi":"10.1186/s13059-024-03432-2","DOIUrl":"10.1186/s13059-024-03432-2","url":null,"abstract":"<p><strong>Background: </strong>Inhomogeneous patterns of chromatin-chromatin contacts within 10-100-kb-sized regions of the genome are a generic feature of chromatin spatial organization. These features, termed topologically associating domains (TADs), have led to the loop extrusion factor (LEF) model. Currently, our ability to model TADs relies on the observation that in vertebrates TAD boundaries are correlated with DNA sequences that bind CTCF, which therefore is inferred to block loop extrusion. However, although TADs feature prominently in their Hi-C maps, non-vertebrate eukaryotes either do not express CTCF or show few TAD boundaries that correlate with CTCF sites. In all of these organisms, the counterparts of CTCF remain unknown, frustrating comparisons between Hi-C data and simulations.</p><p><strong>Results: </strong>To extend the LEF model across the tree of life, here, we propose the conserved-current loop extrusion (CCLE) model that interprets loop-extruding cohesin as a nearly conserved probability current. From cohesin ChIP-seq data alone, we derive a position-dependent loop extrusion rate, allowing for a modified paradigm for loop extrusion, that goes beyond solely localized barriers to also include loop extrusion rates that vary continuously. We show that CCLE accurately predicts the TAD-scale Hi-C maps of interphase Schizosaccharomyces pombe, as well as those of meiotic and mitotic Saccharomyces cerevisiae, demonstrating its utility in organisms lacking CTCF.</p><p><strong>Conclusions: </strong>The success of CCLE in yeasts suggests that loop extrusion by cohesin is indeed the primary mechanism underlying TADs in these systems. CCLE allows us to obtain loop extrusion parameters such as the LEF density and processivity, which compare well to independent estimates.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"25 1","pages":"293"},"PeriodicalIF":12.3,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11566905/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142630799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Messenger RNA splicing and degradation are critical for gene expression regulation, the abnormality of which leads to diseases. Previous methods for estimating kinetic rates have limitations, assuming uniform rates across cells. DeepKINET is a deep generative model that estimates splicing and degradation rates at single-cell resolution from scRNA-seq data. DeepKINET outperforms existing methods on simulated and metabolic labeling datasets. Applied to forebrain and breast cancer data, it identifies RNA-binding proteins responsible for kinetic rate diversity. DeepKINET also analyzes the effects of splicing factor mutations on target genes in erythroid lineage cells. DeepKINET effectively reveals cellular heterogeneity in post-transcriptional regulation.
{"title":"DeepKINET: a deep generative model for estimating single-cell RNA splicing and degradation rates.","authors":"Chikara Mizukoshi, Yasuhiro Kojima, Satoshi Nomura, Shuto Hayashi, Ko Abe, Teppei Shimamura","doi":"10.1186/s13059-024-03367-8","DOIUrl":"10.1186/s13059-024-03367-8","url":null,"abstract":"<p><p>Messenger RNA splicing and degradation are critical for gene expression regulation, the abnormality of which leads to diseases. Previous methods for estimating kinetic rates have limitations, assuming uniform rates across cells. DeepKINET is a deep generative model that estimates splicing and degradation rates at single-cell resolution from scRNA-seq data. DeepKINET outperforms existing methods on simulated and metabolic labeling datasets. Applied to forebrain and breast cancer data, it identifies RNA-binding proteins responsible for kinetic rate diversity. DeepKINET also analyzes the effects of splicing factor mutations on target genes in erythroid lineage cells. DeepKINET effectively reveals cellular heterogeneity in post-transcriptional regulation.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"25 1","pages":"229"},"PeriodicalIF":12.3,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11378460/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142141562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-26DOI: 10.1186/s13059-024-03371-y
Agustín Amalfitano, Nicolás Stocchi, Hugo Marcelo Atencio, Fernando Villarreal, Arjen Ten Have
Seqrutinator is an objective, flexible pipeline that removes sequences with sequencing and/or gene model errors and sequences from pseudogenes from complex, eukaryotic protein superfamilies. Testing Seqrutinator on major superfamilies BAHD, CYP, and UGT removes only 1.94% of SwissProt entries, 14% of entries from the model plant Arabidopsis thaliana, but 80% of entries from Pinus taeda's recent complete proteome. Application of Seqrutinator on crude BAHDomes, CYPomes, and UGTomes obtained from 16 plant proteomes shows convergence of the numbers of paralogues. MSAs, phylogenies, and particularly functional clustering improve drastically upon Seqrutinator application, indicating good performance.
{"title":"Seqrutinator: scrutiny of large protein superfamily sequence datasets for the identification and elimination of non-functional homologues.","authors":"Agustín Amalfitano, Nicolás Stocchi, Hugo Marcelo Atencio, Fernando Villarreal, Arjen Ten Have","doi":"10.1186/s13059-024-03371-y","DOIUrl":"10.1186/s13059-024-03371-y","url":null,"abstract":"<p><p>Seqrutinator is an objective, flexible pipeline that removes sequences with sequencing and/or gene model errors and sequences from pseudogenes from complex, eukaryotic protein superfamilies. Testing Seqrutinator on major superfamilies BAHD, CYP, and UGT removes only 1.94% of SwissProt entries, 14% of entries from the model plant Arabidopsis thaliana, but 80% of entries from Pinus taeda's recent complete proteome. Application of Seqrutinator on crude BAHDomes, CYPomes, and UGTomes obtained from 16 plant proteomes shows convergence of the numbers of paralogues. MSAs, phylogenies, and particularly functional clustering improve drastically upon Seqrutinator application, indicating good performance.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"25 1","pages":"230"},"PeriodicalIF":12.3,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11346255/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142074283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-15DOI: 10.1186/s13059-024-03307-6
Wen-Jou Chang, Maria S Baker, Eleonora Laritsky, Chathura J Gunasekara, Uditha Maduranga, Justine C Galliou, Joseph W McFadden, Jessica R Waltemyer, Bruce Berggren-Thomas, Brianna N Tate, Hanxue Zhang, Benjamin D Rosen, Curtis P Van Tassell, George E Liu, Cristian Coarfa, Yi Athena Ren, Robert A Waterland
Background: We recently identified ~ 10,000 correlated regions of systemic interindividual epigenetic variation (CoRSIVs) in the human genome. These methylation variants are amenable to population studies, as DNA methylation measurements in blood provide information on epigenetic regulation throughout the body. Moreover, establishment of DNA methylation at human CoRSIVs is labile to periconceptional influences such as nutrition. Here, we analyze publicly available whole-genome bisulfite sequencing data on multiple tissues of each of two Holstein cows to determine whether CoRSIVs exist in cattle.
Results: Focusing on genomic blocks with ≥ 5 CpGs and a systemic interindividual variation index of at least 20, our approach identifies 217 cattle CoRSIVs, a subset of which we independently validate by bisulfite pyrosequencing. Similar to human CoRSIVs, those in cattle are strongly associated with genetic variation. Also as in humans, we show that establishment of DNA methylation at cattle CoRSIVs is particularly sensitive to early embryonic environment, in the context of embryo culture during assisted reproduction.
Conclusions: Our data indicate that CoRSIVs exist in cattle, as in humans, suggesting these systemic epigenetic variants may be common to mammals in general. To the extent that individual epigenetic variation at cattle CoRSIVs affects phenotypic outcomes, assessment of CoRSIV methylation at birth may become an important tool for optimizing agriculturally important traits. Moreover, adjusting embryo culture conditions during assisted reproduction may provide opportunities to tailor agricultural outcomes by engineering CoRSIV methylation profiles.
{"title":"Systemic interindividual DNA methylation variants in cattle share major hallmarks with those in humans.","authors":"Wen-Jou Chang, Maria S Baker, Eleonora Laritsky, Chathura J Gunasekara, Uditha Maduranga, Justine C Galliou, Joseph W McFadden, Jessica R Waltemyer, Bruce Berggren-Thomas, Brianna N Tate, Hanxue Zhang, Benjamin D Rosen, Curtis P Van Tassell, George E Liu, Cristian Coarfa, Yi Athena Ren, Robert A Waterland","doi":"10.1186/s13059-024-03307-6","DOIUrl":"10.1186/s13059-024-03307-6","url":null,"abstract":"<p><strong>Background: </strong>We recently identified ~ 10,000 correlated regions of systemic interindividual epigenetic variation (CoRSIVs) in the human genome. These methylation variants are amenable to population studies, as DNA methylation measurements in blood provide information on epigenetic regulation throughout the body. Moreover, establishment of DNA methylation at human CoRSIVs is labile to periconceptional influences such as nutrition. Here, we analyze publicly available whole-genome bisulfite sequencing data on multiple tissues of each of two Holstein cows to determine whether CoRSIVs exist in cattle.</p><p><strong>Results: </strong>Focusing on genomic blocks with ≥ 5 CpGs and a systemic interindividual variation index of at least 20, our approach identifies 217 cattle CoRSIVs, a subset of which we independently validate by bisulfite pyrosequencing. Similar to human CoRSIVs, those in cattle are strongly associated with genetic variation. Also as in humans, we show that establishment of DNA methylation at cattle CoRSIVs is particularly sensitive to early embryonic environment, in the context of embryo culture during assisted reproduction.</p><p><strong>Conclusions: </strong>Our data indicate that CoRSIVs exist in cattle, as in humans, suggesting these systemic epigenetic variants may be common to mammals in general. To the extent that individual epigenetic variation at cattle CoRSIVs affects phenotypic outcomes, assessment of CoRSIV methylation at birth may become an important tool for optimizing agriculturally important traits. Moreover, adjusting embryo culture conditions during assisted reproduction may provide opportunities to tailor agricultural outcomes by engineering CoRSIV methylation profiles.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"25 1","pages":"185"},"PeriodicalIF":12.3,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247883/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141617438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pathogenic allele silencing is a promising treatment for genetic hereditary diseases. Here, we develop an RNA-cleaving tool, TaqTth-hpRNA, consisting of a small, chimeric TaqTth, and a hairpin RNA guiding probe. With a minimal flanking sequence-motif requirement, in vitro and in vivo studies show TaqTth-hpRNA cleaves RNA efficiently and specifically. In an Alzheimer's disease model, we demonstrate silencing of mutant APPswe mRNA without altering the wild-type APP mRNA. Notably, due to the compact size of TaqTth, we are able to combine with APOE2 overexpression in a single AAV vector, which results in stronger inhibition of pathologies.
{"title":"TaqTth-hpRNA: a novel compact RNA-targeting tool for specific silencing of pathogenic mRNA.","authors":"Chong Xu, Jiyanuo Cao, Huanran Qiang, Yu Liu, Jialin Wu, Qiudan Luo, Meng Wan, Yujie Wang, Peiliang Wang, Qian Cheng, Guohua Zhou, Jian Sima, Yongjian Guo, Shu Xu","doi":"10.1186/s13059-024-03326-3","DOIUrl":"10.1186/s13059-024-03326-3","url":null,"abstract":"<p><p>Pathogenic allele silencing is a promising treatment for genetic hereditary diseases. Here, we develop an RNA-cleaving tool, TaqTth-hpRNA, consisting of a small, chimeric TaqTth, and a hairpin RNA guiding probe. With a minimal flanking sequence-motif requirement, in vitro and in vivo studies show TaqTth-hpRNA cleaves RNA efficiently and specifically. In an Alzheimer's disease model, we demonstrate silencing of mutant APP<sup>swe</sup> mRNA without altering the wild-type APP mRNA. Notably, due to the compact size of TaqTth, we are able to combine with APOE2 overexpression in a single AAV vector, which results in stronger inhibition of pathologies.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"25 1","pages":"179"},"PeriodicalIF":12.3,"publicationDate":"2024-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11229350/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141555787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: The massive structural variations and frequent introgression highly contribute to the genetic diversity of wheat, while the huge and complex genome of polyploid wheat hinders efficient genotyping of abundant varieties towards accurate identification, management, and exploitation of germplasm resources.
Results: We develop a novel workflow that identifies 1240 high-quality large copy number variation blocks (CNVb) in wheat at the pan-genome level, demonstrating that CNVb can serve as an ideal DNA fingerprinting marker for discriminating massive varieties, with the accuracy validated by PCR assay. We then construct a digitalized genotyping CNVb map across 1599 global wheat accessions. Key CNVb markers are linked with trait-associated introgressions, such as the 1RS·1BL translocation and 2NvS translocation, and the beneficial alleles, such as the end-use quality allele Glu-D1d (Dx5 + Dy10) and the semi-dwarf r-e-z allele. Furthermore, we demonstrate that these tagged CNVb markers promote a stable and cost-effective strategy for evaluating wheat germplasm resources with ultra-low-coverage sequencing data, competing with SNP array for applications such as evaluating new varieties, efficient management of collections in gene banks, and describing wheat germplasm resources in a digitalized manner. We also develop a user-friendly interactive platform, WheatCNVb ( http://wheat.cau.edu.cn/WheatCNVb/ ), for exploring the CNVb profiles over ever-increasing wheat accessions, and also propose a QR-code-like representation of individual digital CNVb fingerprint. This platform also allows uploading new CNVb profiles for comparison with stored varieties.
Conclusions: The CNVb-based approach provides a low-cost and high-throughput genotyping strategy for enabling digitalized wheat germplasm management and modern breeding with precise and practical decision-making.
{"title":"Tagging large CNV blocks in wheat boosts digitalization of germplasm resources by ultra-low-coverage sequencing.","authors":"Jianxia Niu, Wenxi Wang, Zihao Wang, Zhe Chen, Xiaoyu Zhang, Zhen Qin, Lingfeng Miao, Zhengzhao Yang, Chaojie Xie, Mingming Xin, Huiru Peng, Yingyin Yao, Jie Liu, Zhongfu Ni, Qixin Sun, Weilong Guo","doi":"10.1186/s13059-024-03315-6","DOIUrl":"10.1186/s13059-024-03315-6","url":null,"abstract":"<p><strong>Background: </strong>The massive structural variations and frequent introgression highly contribute to the genetic diversity of wheat, while the huge and complex genome of polyploid wheat hinders efficient genotyping of abundant varieties towards accurate identification, management, and exploitation of germplasm resources.</p><p><strong>Results: </strong>We develop a novel workflow that identifies 1240 high-quality large copy number variation blocks (CNVb) in wheat at the pan-genome level, demonstrating that CNVb can serve as an ideal DNA fingerprinting marker for discriminating massive varieties, with the accuracy validated by PCR assay. We then construct a digitalized genotyping CNVb map across 1599 global wheat accessions. Key CNVb markers are linked with trait-associated introgressions, such as the 1RS·1BL translocation and 2N<sup>v</sup>S translocation, and the beneficial alleles, such as the end-use quality allele Glu-D1d (Dx5 + Dy10) and the semi-dwarf r-e-z allele. Furthermore, we demonstrate that these tagged CNVb markers promote a stable and cost-effective strategy for evaluating wheat germplasm resources with ultra-low-coverage sequencing data, competing with SNP array for applications such as evaluating new varieties, efficient management of collections in gene banks, and describing wheat germplasm resources in a digitalized manner. We also develop a user-friendly interactive platform, WheatCNVb ( http://wheat.cau.edu.cn/WheatCNVb/ ), for exploring the CNVb profiles over ever-increasing wheat accessions, and also propose a QR-code-like representation of individual digital CNVb fingerprint. This platform also allows uploading new CNVb profiles for comparison with stored varieties.</p><p><strong>Conclusions: </strong>The CNVb-based approach provides a low-cost and high-throughput genotyping strategy for enabling digitalized wheat germplasm management and modern breeding with precise and practical decision-making.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"25 1","pages":"171"},"PeriodicalIF":12.3,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11218387/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141477776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1186/s13059-024-03312-9
Kevin Lamkiewicz, Lisa-Marie Barf, Konrad Sachse, Martin Hölzer
Microbial pangenome analysis identifies present or absent genes in prokaryotic genomes. However, current tools are limited when analyzing species with higher sequence diversity or higher taxonomic orders such as genera or families. The Roary ILP Bacterial core Annotation Pipeline (RIBAP) uses an integer linear programming approach to refine gene clusters predicted by Roary for identifying core genes. RIBAP successfully handles the complexity and diversity of Chlamydia, Klebsiella, Brucella, and Enterococcus genomes, outperforming other established and recent pangenome tools for identifying all-encompassing core genes at the genus level. RIBAP is a freely available Nextflow pipeline at github.com/hoelzer-lab/ribap and zenodo.org/doi/10.5281/zenodo.10890871.
{"title":"RIBAP: a comprehensive bacterial core genome annotation pipeline for pangenome calculation beyond the species level.","authors":"Kevin Lamkiewicz, Lisa-Marie Barf, Konrad Sachse, Martin Hölzer","doi":"10.1186/s13059-024-03312-9","DOIUrl":"10.1186/s13059-024-03312-9","url":null,"abstract":"<p><p>Microbial pangenome analysis identifies present or absent genes in prokaryotic genomes. However, current tools are limited when analyzing species with higher sequence diversity or higher taxonomic orders such as genera or families. The Roary ILP Bacterial core Annotation Pipeline (RIBAP) uses an integer linear programming approach to refine gene clusters predicted by Roary for identifying core genes. RIBAP successfully handles the complexity and diversity of Chlamydia, Klebsiella, Brucella, and Enterococcus genomes, outperforming other established and recent pangenome tools for identifying all-encompassing core genes at the genus level. RIBAP is a freely available Nextflow pipeline at github.com/hoelzer-lab/ribap and zenodo.org/doi/10.5281/zenodo.10890871.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"25 1","pages":"170"},"PeriodicalIF":12.3,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11218241/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141477775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1186/s13059-024-03314-7
Daniel R Tabet, Da Kuang, Megan C Lancaster, Roujia Li, Karen Liu, Jochen Weile, Atina G Coté, Yingzhou Wu, Robert A Hegele, Dan M Roden, Frederick P Roth
Background: Computational variant effect predictors offer a scalable and increasingly reliable means of interpreting human genetic variation, but concerns of circularity and bias have limited previous methods for evaluating and comparing predictors. Population-level cohorts of genotyped and phenotyped participants that have not been used in predictor training can facilitate an unbiased benchmarking of available methods. Using a curated set of human gene-trait associations with a reported rare-variant burden association, we evaluate the correlations of 24 computational variant effect predictors with associated human traits in the UK Biobank and All of Us cohorts.
Results: AlphaMissense outperformed all other predictors in inferring human traits based on rare missense variants in UK Biobank and All of Us participants. The overall rankings of computational variant effect predictors in these two cohorts showed a significant positive correlation.
Conclusion: We describe a method to assess computational variant effect predictors that sidesteps the limitations of previous evaluations. This approach is generalizable to future predictors and could continue to inform predictor choice for personal and clinical genetics.
{"title":"Benchmarking computational variant effect predictors by their ability to infer human traits.","authors":"Daniel R Tabet, Da Kuang, Megan C Lancaster, Roujia Li, Karen Liu, Jochen Weile, Atina G Coté, Yingzhou Wu, Robert A Hegele, Dan M Roden, Frederick P Roth","doi":"10.1186/s13059-024-03314-7","DOIUrl":"10.1186/s13059-024-03314-7","url":null,"abstract":"<p><strong>Background: </strong>Computational variant effect predictors offer a scalable and increasingly reliable means of interpreting human genetic variation, but concerns of circularity and bias have limited previous methods for evaluating and comparing predictors. Population-level cohorts of genotyped and phenotyped participants that have not been used in predictor training can facilitate an unbiased benchmarking of available methods. Using a curated set of human gene-trait associations with a reported rare-variant burden association, we evaluate the correlations of 24 computational variant effect predictors with associated human traits in the UK Biobank and All of Us cohorts.</p><p><strong>Results: </strong>AlphaMissense outperformed all other predictors in inferring human traits based on rare missense variants in UK Biobank and All of Us participants. The overall rankings of computational variant effect predictors in these two cohorts showed a significant positive correlation.</p><p><strong>Conclusion: </strong>We describe a method to assess computational variant effect predictors that sidesteps the limitations of previous evaluations. This approach is generalizable to future predictors and could continue to inform predictor choice for personal and clinical genetics.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"25 1","pages":"172"},"PeriodicalIF":12.3,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11218265/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141477774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-31DOI: 10.1186/s13059-024-03175-0
Elena M Pugacheva, Dharmendra Nath Bhatt, Samuel Rivero-Hinojosa, Md Tajmul, Liron Fedida, Emma Price, Yon Ji, Dmitri Loukinov, Alexander V Strunnikov, Bing Ren, Victor V Lobanenkov
Background: Pervasive usage of alternative promoters leads to the deregulation of gene expression in carcinogenesis and may drive the emergence of new genes in spermatogenesis. However, little is known regarding the mechanisms underpinning the activation of alternative promoters.
Results: Here we describe how alternative cancer-testis-specific transcription is activated. We show that intergenic and intronic CTCF binding sites, which are transcriptionally inert in normal somatic cells, could be epigenetically reprogrammed into active de novo promoters in germ and cancer cells. BORIS/CTCFL, the testis-specific paralog of the ubiquitously expressed CTCF, triggers the epigenetic reprogramming of CTCF sites into units of active transcription. BORIS binding initiates the recruitment of the chromatin remodeling factor, SRCAP, followed by the replacement of H2A histone with H2A.Z, resulting in a more relaxed chromatin state in the nucleosomes flanking the CTCF binding sites. The relaxation of chromatin around CTCF binding sites facilitates the recruitment of multiple additional transcription factors, thereby activating transcription from a given binding site. We demonstrate that the epigenetically reprogrammed CTCF binding sites can drive the expression of cancer-testis genes, long noncoding RNAs, retro-pseudogenes, and dormant transposable elements.
Conclusions: Thus, BORIS functions as a transcription factor that epigenetically reprograms clustered CTCF binding sites into transcriptional start sites, promoting transcription from alternative promoters in both germ cells and cancer cells.
{"title":"BORIS/CTCFL epigenetically reprograms clustered CTCF binding sites into alternative transcriptional start sites.","authors":"Elena M Pugacheva, Dharmendra Nath Bhatt, Samuel Rivero-Hinojosa, Md Tajmul, Liron Fedida, Emma Price, Yon Ji, Dmitri Loukinov, Alexander V Strunnikov, Bing Ren, Victor V Lobanenkov","doi":"10.1186/s13059-024-03175-0","DOIUrl":"10.1186/s13059-024-03175-0","url":null,"abstract":"<p><strong>Background: </strong>Pervasive usage of alternative promoters leads to the deregulation of gene expression in carcinogenesis and may drive the emergence of new genes in spermatogenesis. However, little is known regarding the mechanisms underpinning the activation of alternative promoters.</p><p><strong>Results: </strong>Here we describe how alternative cancer-testis-specific transcription is activated. We show that intergenic and intronic CTCF binding sites, which are transcriptionally inert in normal somatic cells, could be epigenetically reprogrammed into active de novo promoters in germ and cancer cells. BORIS/CTCFL, the testis-specific paralog of the ubiquitously expressed CTCF, triggers the epigenetic reprogramming of CTCF sites into units of active transcription. BORIS binding initiates the recruitment of the chromatin remodeling factor, SRCAP, followed by the replacement of H2A histone with H2A.Z, resulting in a more relaxed chromatin state in the nucleosomes flanking the CTCF binding sites. The relaxation of chromatin around CTCF binding sites facilitates the recruitment of multiple additional transcription factors, thereby activating transcription from a given binding site. We demonstrate that the epigenetically reprogrammed CTCF binding sites can drive the expression of cancer-testis genes, long noncoding RNAs, retro-pseudogenes, and dormant transposable elements.</p><p><strong>Conclusions: </strong>Thus, BORIS functions as a transcription factor that epigenetically reprograms clustered CTCF binding sites into transcriptional start sites, promoting transcription from alternative promoters in both germ cells and cancer cells.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"25 1","pages":"40"},"PeriodicalIF":12.3,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10832218/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139651980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-31DOI: 10.1186/s13059-024-03171-4
Sarah Fazal, Matt C Danzi, Isaac Xu, Shilpa Nadimpalli Kobren, Shamil Sunyaev, Chloe Reuter, Shruti Marwaha, Matthew Wheeler, Egor Dolzhenko, Francesca Lucas, Stefan Wuchty, Mustafa Tekin, Stephan Züchner, Vanessa Aguiar-Pulido
Expansions of tandem repeats (TRs) cause approximately 60 monogenic diseases. We expect that the discovery of additional pathogenic repeat expansions will narrow the diagnostic gap in many diseases. A growing number of TR expansions are being identified, and interpreting them is a challenge. We present RExPRT (Repeat EXpansion Pathogenicity pRediction Tool), a machine learning tool for distinguishing pathogenic from benign TR expansions. Our results demonstrate that an ensemble approach classifies TRs with an average precision of 93% and recall of 83%. RExPRT's high precision will be valuable in large-scale discovery studies, which require prioritization of candidate loci for follow-up studies.
{"title":"RExPRT: a machine learning tool to predict pathogenicity of tandem repeat loci.","authors":"Sarah Fazal, Matt C Danzi, Isaac Xu, Shilpa Nadimpalli Kobren, Shamil Sunyaev, Chloe Reuter, Shruti Marwaha, Matthew Wheeler, Egor Dolzhenko, Francesca Lucas, Stefan Wuchty, Mustafa Tekin, Stephan Züchner, Vanessa Aguiar-Pulido","doi":"10.1186/s13059-024-03171-4","DOIUrl":"10.1186/s13059-024-03171-4","url":null,"abstract":"<p><p>Expansions of tandem repeats (TRs) cause approximately 60 monogenic diseases. We expect that the discovery of additional pathogenic repeat expansions will narrow the diagnostic gap in many diseases. A growing number of TR expansions are being identified, and interpreting them is a challenge. We present RExPRT (Repeat EXpansion Pathogenicity pRediction Tool), a machine learning tool for distinguishing pathogenic from benign TR expansions. Our results demonstrate that an ensemble approach classifies TRs with an average precision of 93% and recall of 83%. RExPRT's high precision will be valuable in large-scale discovery studies, which require prioritization of candidate loci for follow-up studies.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"25 1","pages":"39"},"PeriodicalIF":12.3,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10832122/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139651982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}