The false discovery rate (FDR) controlling method by Benjamini and Hochberg (BH) is a popular choice in the omics fields. Here, we demonstrate that in datasets with a large degree of dependencies between features, FDR correction methods like BH can sometimes counter-intuitively report very high numbers of false positives, potentially misleading researchers. We call the attention of researchers to use suited multiple testing strategies and approaches like synthetic null data (negative control) to identify and minimize caveats related to false discoveries, as in the cases where false findings do occur, they may be numerous.
{"title":"Beware of counter-intuitive levels of false discoveries in datasets with strong intra-correlations.","authors":"Chakravarthi Kanduri, Maria Mamica, Emilie Willoch Olstad, Manuela Zucknick, Jingyi Jessica Li, Geir Kjetil Sandve","doi":"10.1186/s13059-025-03734-z","DOIUrl":"10.1186/s13059-025-03734-z","url":null,"abstract":"<p><p>The false discovery rate (FDR) controlling method by Benjamini and Hochberg (BH) is a popular choice in the omics fields. Here, we demonstrate that in datasets with a large degree of dependencies between features, FDR correction methods like BH can sometimes counter-intuitively report very high numbers of false positives, potentially misleading researchers. We call the attention of researchers to use suited multiple testing strategies and approaches like synthetic null data (negative control) to identify and minimize caveats related to false discoveries, as in the cases where false findings do occur, they may be numerous.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"26 1","pages":"249"},"PeriodicalIF":12.3,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12359981/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144876349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-18DOI: 10.1186/s13059-025-03708-1
Xiufei Chen, Jingfei Cheng, Linzhen Kong, Xiao Shu, Haiqi Xu, Masato Inoue, Marion Silvana Fernández-Berrocal, Dagny Sanden Døskeland, Magnar Bjørås, Shivan Sivakumar, Yibin Liu, Jing Ye, Chun-Xiao Song
We present direct sequencing methodologies, scTAPS for 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) and scCAPS + specifically for 5hmC, enabling quantitative detection of 5mC and 5hmC at single-base resolution and single-cell level. Achieving approximately 90% mapping efficiency, our plate-based methods accurately recover 5mC and 5hmC profiles in CD8 + T and mouse embryonic stem cells. Notably, scCAPS + reveals a global increase in 5hmC across neuronal and non-neuronal cells in the hippocampus of aging mice. Our methods offer strong potential for seamless integration into high-throughput single-cell multi-omics, facilitating future investigations of epigenomic dynamics in specific biological processes.
{"title":"Direct and bisulfite-free 5-methylcytosine and 5-hydroxymethylcytosine sequencing at single-cell resolution with scTAPS and scCAPS + .","authors":"Xiufei Chen, Jingfei Cheng, Linzhen Kong, Xiao Shu, Haiqi Xu, Masato Inoue, Marion Silvana Fernández-Berrocal, Dagny Sanden Døskeland, Magnar Bjørås, Shivan Sivakumar, Yibin Liu, Jing Ye, Chun-Xiao Song","doi":"10.1186/s13059-025-03708-1","DOIUrl":"10.1186/s13059-025-03708-1","url":null,"abstract":"<p><p>We present direct sequencing methodologies, scTAPS for 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) and scCAPS + specifically for 5hmC, enabling quantitative detection of 5mC and 5hmC at single-base resolution and single-cell level. Achieving approximately 90% mapping efficiency, our plate-based methods accurately recover 5mC and 5hmC profiles in CD8 + T and mouse embryonic stem cells. Notably, scCAPS + reveals a global increase in 5hmC across neuronal and non-neuronal cells in the hippocampus of aging mice. Our methods offer strong potential for seamless integration into high-throughput single-cell multi-omics, facilitating future investigations of epigenomic dynamics in specific biological processes.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"26 1","pages":"244"},"PeriodicalIF":12.3,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12359873/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144876351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-18DOI: 10.1186/s13059-025-03715-2
Marina Goliasse, Aurore Johary, Adrian E Platts, Fabian Ortner-Krause, Patrick P Edger, Jae Young Choi, Michael D Purugganan, Zoé Joly-Lopez
Background: Efforts to characterize regulatory elements in plant genomes traditionally rely on evolutionary conservation and chromatin accessibility. Recently, intergenic bi-directional nascent transcript has emerged as a putative hallmark of active enhancers. Here, we integrate these approaches to better define the cis-regulatory landscape of the rice genome.
Results: In juvenile leaf tissues of the Azucena rice variety, we analyze conserved noncoding sequences, intergenic bi-directional transcripts, and regions of open chromatin. These three features highlight distinct classes of regulatory targets, each exhibiting complexity and regulatory roles. Conserved noncoding sequences are associated with more complex regulatory interactions, while regions marked by chromatin accessibility or bi-directional nascent transcription tend to promote more stable regulatory activity. Some transcribed regulatory sites harbor elements linked to transposable element silencing, whereas others correlate with increased expression of nearby genes, pointing to candidate transcribed regulatory elements. We further identified molecular interactions between genic regions and intergenic transcribed regulatory elements using 3-dimensional chromatin contact data, we identify physical interactions between transcribed intergenic regions and genic regions. These interactions often co-localize with expression quantitative trait loci and coincide with increased transcription, further supporting a regulatory role.
Conclusions: Our integrative analysis reveals multiple distinct classes of regulatory elements in the rice genome, with overlapping but non-identical targets and signatures. Many candidate elements share features consistent with transcriptional enhancement, though the specific criteria for defining active enhancers in plants require further characterization. These findings underscore the importance of using complementary genomic signals to discover and characterize functionally diverse regulatory elements in plant genomes.
{"title":"Uncovering the multi-layer cis-regulatory landscape of rice via integrative nascent RNA analysis.","authors":"Marina Goliasse, Aurore Johary, Adrian E Platts, Fabian Ortner-Krause, Patrick P Edger, Jae Young Choi, Michael D Purugganan, Zoé Joly-Lopez","doi":"10.1186/s13059-025-03715-2","DOIUrl":"10.1186/s13059-025-03715-2","url":null,"abstract":"<p><strong>Background: </strong>Efforts to characterize regulatory elements in plant genomes traditionally rely on evolutionary conservation and chromatin accessibility. Recently, intergenic bi-directional nascent transcript has emerged as a putative hallmark of active enhancers. Here, we integrate these approaches to better define the cis-regulatory landscape of the rice genome.</p><p><strong>Results: </strong>In juvenile leaf tissues of the Azucena rice variety, we analyze conserved noncoding sequences, intergenic bi-directional transcripts, and regions of open chromatin. These three features highlight distinct classes of regulatory targets, each exhibiting complexity and regulatory roles. Conserved noncoding sequences are associated with more complex regulatory interactions, while regions marked by chromatin accessibility or bi-directional nascent transcription tend to promote more stable regulatory activity. Some transcribed regulatory sites harbor elements linked to transposable element silencing, whereas others correlate with increased expression of nearby genes, pointing to candidate transcribed regulatory elements. We further identified molecular interactions between genic regions and intergenic transcribed regulatory elements using 3-dimensional chromatin contact data, we identify physical interactions between transcribed intergenic regions and genic regions. These interactions often co-localize with expression quantitative trait loci and coincide with increased transcription, further supporting a regulatory role.</p><p><strong>Conclusions: </strong>Our integrative analysis reveals multiple distinct classes of regulatory elements in the rice genome, with overlapping but non-identical targets and signatures. Many candidate elements share features consistent with transcriptional enhancement, though the specific criteria for defining active enhancers in plants require further characterization. These findings underscore the importance of using complementary genomic signals to discover and characterize functionally diverse regulatory elements in plant genomes.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"26 1","pages":"250"},"PeriodicalIF":12.3,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12359857/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144876355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-18DOI: 10.1186/s13059-025-03721-4
Zhoujingpeng Wei, Guanhua Chen, Zheng-Zheng Tang
Standard protocols for meta-analysis of association studies are inadequate for microbiome data due to their complex compositional structure, leading to inaccurate and unstable microbial signature selection. To address this issue, we introduce Melody, a framework that generates, harmonizes, and combines study-specific summary association statistics to powerfully and robustly identify microbial signatures in meta-analysis. Comprehensive and realistic simulations demonstrate that Melody substantially outperforms existing approaches in prioritizing true signatures. In the meta-analyses of five studies on colorectal cancer and eight studies on the gut metabolome, we showcase the superior stability, reliability, and predictive performance of Melody-identified signatures.
{"title":"Melody: meta-analysis of microbiome association studies for discovering generalizable microbial signatures.","authors":"Zhoujingpeng Wei, Guanhua Chen, Zheng-Zheng Tang","doi":"10.1186/s13059-025-03721-4","DOIUrl":"10.1186/s13059-025-03721-4","url":null,"abstract":"<p><p>Standard protocols for meta-analysis of association studies are inadequate for microbiome data due to their complex compositional structure, leading to inaccurate and unstable microbial signature selection. To address this issue, we introduce Melody, a framework that generates, harmonizes, and combines study-specific summary association statistics to powerfully and robustly identify microbial signatures in meta-analysis. Comprehensive and realistic simulations demonstrate that Melody substantially outperforms existing approaches in prioritizing true signatures. In the meta-analyses of five studies on colorectal cancer and eight studies on the gut metabolome, we showcase the superior stability, reliability, and predictive performance of Melody-identified signatures.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"26 1","pages":"245"},"PeriodicalIF":12.3,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12359909/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144876352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-28DOI: 10.1186/s13059-024-03439-9
Thomas W Winkler, Simon Wiegrebe, Janina M Herold, Klaus J Stark, Helmut Küchenhoff, Iris M Heid
Background: Genome-wide association studies (GWAS) have identified thousands of loci for disease-related human traits in cross-sectional data. However, the impact of age on genetic effects is underacknowledged. Also, identifying genetic effects on longitudinal trait change has been hampered by small sample sizes for longitudinal data. Such effects on deteriorating trait levels over time or disease progression can be clinically relevant.
Results: Under certain assumptions, we demonstrate analytically that genetic-by-age interaction observed in cross-sectional data can be indicative of genetic association on longitudinal trait change. We propose a 2-stage approach with genome-wide pre-screening for genetic-by-age interaction in cross-sectional data and testing identified variants for longitudinal change in independent longitudinal data. Within UK Biobank cross-sectional data, we analyze 8 complex traits (up to 370,000 individuals). We identify 44 genetic-by-age interactions (7 loci for obesity traits, 26 for pulse pressure, few to none for lipids). Our cross-trait view reveals trait-specificity regarding the proportion of loci with age-modulated effects, which is particularly high for pulse pressure. Testing the 44 variants in longitudinal data (up to 50,000 individuals), we observe significant effects on change for obesity traits (near APOE, TMEM18, TFAP2B) and pulse pressure (near FBN1, IGFBP3; known for implication in arterial stiffness processes).
Conclusions: We provide analytical and empirical evidence that cross-sectional genetic-by-age interaction can help pinpoint longitudinal-change effects, when cross-sectional data surpasses longitudinal sample size. Our findings shed light on the distinction between traits that are impacted by age-dependent genetic effects and those that are not.
{"title":"Genetic-by-age interaction analyses on complex traits in UK Biobank and their potential to identify effects on longitudinal trait change.","authors":"Thomas W Winkler, Simon Wiegrebe, Janina M Herold, Klaus J Stark, Helmut Küchenhoff, Iris M Heid","doi":"10.1186/s13059-024-03439-9","DOIUrl":"10.1186/s13059-024-03439-9","url":null,"abstract":"<p><strong>Background: </strong>Genome-wide association studies (GWAS) have identified thousands of loci for disease-related human traits in cross-sectional data. However, the impact of age on genetic effects is underacknowledged. Also, identifying genetic effects on longitudinal trait change has been hampered by small sample sizes for longitudinal data. Such effects on deteriorating trait levels over time or disease progression can be clinically relevant.</p><p><strong>Results: </strong>Under certain assumptions, we demonstrate analytically that genetic-by-age interaction observed in cross-sectional data can be indicative of genetic association on longitudinal trait change. We propose a 2-stage approach with genome-wide pre-screening for genetic-by-age interaction in cross-sectional data and testing identified variants for longitudinal change in independent longitudinal data. Within UK Biobank cross-sectional data, we analyze 8 complex traits (up to 370,000 individuals). We identify 44 genetic-by-age interactions (7 loci for obesity traits, 26 for pulse pressure, few to none for lipids). Our cross-trait view reveals trait-specificity regarding the proportion of loci with age-modulated effects, which is particularly high for pulse pressure. Testing the 44 variants in longitudinal data (up to 50,000 individuals), we observe significant effects on change for obesity traits (near APOE, TMEM18, TFAP2B) and pulse pressure (near FBN1, IGFBP3; known for implication in arterial stiffness processes).</p><p><strong>Conclusions: </strong>We provide analytical and empirical evidence that cross-sectional genetic-by-age interaction can help pinpoint longitudinal-change effects, when cross-sectional data surpasses longitudinal sample size. Our findings shed light on the distinction between traits that are impacted by age-dependent genetic effects and those that are not.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"25 1","pages":"300"},"PeriodicalIF":12.3,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11606088/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142752191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-14DOI: 10.1186/s13059-024-03432-2
Tianyu Yuan, Hao Yan, Kevin C Li, Ivan Surovtsev, Megan C King, Simon G J Mochrie
Background: Inhomogeneous patterns of chromatin-chromatin contacts within 10-100-kb-sized regions of the genome are a generic feature of chromatin spatial organization. These features, termed topologically associating domains (TADs), have led to the loop extrusion factor (LEF) model. Currently, our ability to model TADs relies on the observation that in vertebrates TAD boundaries are correlated with DNA sequences that bind CTCF, which therefore is inferred to block loop extrusion. However, although TADs feature prominently in their Hi-C maps, non-vertebrate eukaryotes either do not express CTCF or show few TAD boundaries that correlate with CTCF sites. In all of these organisms, the counterparts of CTCF remain unknown, frustrating comparisons between Hi-C data and simulations.
Results: To extend the LEF model across the tree of life, here, we propose the conserved-current loop extrusion (CCLE) model that interprets loop-extruding cohesin as a nearly conserved probability current. From cohesin ChIP-seq data alone, we derive a position-dependent loop extrusion rate, allowing for a modified paradigm for loop extrusion, that goes beyond solely localized barriers to also include loop extrusion rates that vary continuously. We show that CCLE accurately predicts the TAD-scale Hi-C maps of interphase Schizosaccharomyces pombe, as well as those of meiotic and mitotic Saccharomyces cerevisiae, demonstrating its utility in organisms lacking CTCF.
Conclusions: The success of CCLE in yeasts suggests that loop extrusion by cohesin is indeed the primary mechanism underlying TADs in these systems. CCLE allows us to obtain loop extrusion parameters such as the LEF density and processivity, which compare well to independent estimates.
背景:基因组中 10-100 kb 大小区域内染色质与染色质接触的不均匀模式是染色质空间组织的一般特征。这些特征被称为拓扑关联域(TADs),并由此产生了环挤出因子(LEF)模型。目前,我们建立 TADs 模型的能力依赖于观察到的现象,即在脊椎动物中,TAD 的边界与结合 CTCF 的 DNA 序列相关,因此推断 CTCF 会阻止环挤压。然而,尽管 TAD 在 Hi-C 图谱中具有显著特征,但非脊椎动物真核生物要么不表达 CTCF,要么很少显示与 CTCF 位点相关的 TAD 边界。在所有这些生物中,CTCF 的对应物仍然未知,这使得 Hi-C 数据与模拟结果之间的比较变得困难:为了将 LEF 模型扩展到整个生命树,我们在此提出了保守电流环挤出(CCLE)模型,该模型将环挤出的凝聚蛋白解释为几乎保守的概率电流。仅从凝聚素 ChIP-seq 数据中,我们就得出了与位置相关的环路挤出率,从而修正了环路挤出的范式,超越了单纯的局部障碍,也包括了连续变化的环路挤出率。我们的研究表明,CCLE 准确预测了间期酵母的 TAD 尺度 Hi-C 图谱以及减数分裂和有丝分裂酵母的 TAD 尺度 Hi-C 图谱,证明了它在缺乏 CTCF 的生物体中的实用性:结论:CCLE 在酵母中的成功应用表明,在这些系统中,凝聚素的环挤压确实是 TAD 的主要机制。CCLE使我们能够获得环挤压参数,如LEF密度和加工率,这些参数与独立的估计值比较接近。
{"title":"Cohesin distribution alone predicts chromatin organization in yeast via conserved-current loop extrusion.","authors":"Tianyu Yuan, Hao Yan, Kevin C Li, Ivan Surovtsev, Megan C King, Simon G J Mochrie","doi":"10.1186/s13059-024-03432-2","DOIUrl":"10.1186/s13059-024-03432-2","url":null,"abstract":"<p><strong>Background: </strong>Inhomogeneous patterns of chromatin-chromatin contacts within 10-100-kb-sized regions of the genome are a generic feature of chromatin spatial organization. These features, termed topologically associating domains (TADs), have led to the loop extrusion factor (LEF) model. Currently, our ability to model TADs relies on the observation that in vertebrates TAD boundaries are correlated with DNA sequences that bind CTCF, which therefore is inferred to block loop extrusion. However, although TADs feature prominently in their Hi-C maps, non-vertebrate eukaryotes either do not express CTCF or show few TAD boundaries that correlate with CTCF sites. In all of these organisms, the counterparts of CTCF remain unknown, frustrating comparisons between Hi-C data and simulations.</p><p><strong>Results: </strong>To extend the LEF model across the tree of life, here, we propose the conserved-current loop extrusion (CCLE) model that interprets loop-extruding cohesin as a nearly conserved probability current. From cohesin ChIP-seq data alone, we derive a position-dependent loop extrusion rate, allowing for a modified paradigm for loop extrusion, that goes beyond solely localized barriers to also include loop extrusion rates that vary continuously. We show that CCLE accurately predicts the TAD-scale Hi-C maps of interphase Schizosaccharomyces pombe, as well as those of meiotic and mitotic Saccharomyces cerevisiae, demonstrating its utility in organisms lacking CTCF.</p><p><strong>Conclusions: </strong>The success of CCLE in yeasts suggests that loop extrusion by cohesin is indeed the primary mechanism underlying TADs in these systems. CCLE allows us to obtain loop extrusion parameters such as the LEF density and processivity, which compare well to independent estimates.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"25 1","pages":"293"},"PeriodicalIF":12.3,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11566905/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142630799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Messenger RNA splicing and degradation are critical for gene expression regulation, the abnormality of which leads to diseases. Previous methods for estimating kinetic rates have limitations, assuming uniform rates across cells. DeepKINET is a deep generative model that estimates splicing and degradation rates at single-cell resolution from scRNA-seq data. DeepKINET outperforms existing methods on simulated and metabolic labeling datasets. Applied to forebrain and breast cancer data, it identifies RNA-binding proteins responsible for kinetic rate diversity. DeepKINET also analyzes the effects of splicing factor mutations on target genes in erythroid lineage cells. DeepKINET effectively reveals cellular heterogeneity in post-transcriptional regulation.
{"title":"DeepKINET: a deep generative model for estimating single-cell RNA splicing and degradation rates.","authors":"Chikara Mizukoshi, Yasuhiro Kojima, Satoshi Nomura, Shuto Hayashi, Ko Abe, Teppei Shimamura","doi":"10.1186/s13059-024-03367-8","DOIUrl":"10.1186/s13059-024-03367-8","url":null,"abstract":"<p><p>Messenger RNA splicing and degradation are critical for gene expression regulation, the abnormality of which leads to diseases. Previous methods for estimating kinetic rates have limitations, assuming uniform rates across cells. DeepKINET is a deep generative model that estimates splicing and degradation rates at single-cell resolution from scRNA-seq data. DeepKINET outperforms existing methods on simulated and metabolic labeling datasets. Applied to forebrain and breast cancer data, it identifies RNA-binding proteins responsible for kinetic rate diversity. DeepKINET also analyzes the effects of splicing factor mutations on target genes in erythroid lineage cells. DeepKINET effectively reveals cellular heterogeneity in post-transcriptional regulation.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"25 1","pages":"229"},"PeriodicalIF":12.3,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11378460/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142141562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-26DOI: 10.1186/s13059-024-03371-y
Agustín Amalfitano, Nicolás Stocchi, Hugo Marcelo Atencio, Fernando Villarreal, Arjen Ten Have
Seqrutinator is an objective, flexible pipeline that removes sequences with sequencing and/or gene model errors and sequences from pseudogenes from complex, eukaryotic protein superfamilies. Testing Seqrutinator on major superfamilies BAHD, CYP, and UGT removes only 1.94% of SwissProt entries, 14% of entries from the model plant Arabidopsis thaliana, but 80% of entries from Pinus taeda's recent complete proteome. Application of Seqrutinator on crude BAHDomes, CYPomes, and UGTomes obtained from 16 plant proteomes shows convergence of the numbers of paralogues. MSAs, phylogenies, and particularly functional clustering improve drastically upon Seqrutinator application, indicating good performance.
{"title":"Seqrutinator: scrutiny of large protein superfamily sequence datasets for the identification and elimination of non-functional homologues.","authors":"Agustín Amalfitano, Nicolás Stocchi, Hugo Marcelo Atencio, Fernando Villarreal, Arjen Ten Have","doi":"10.1186/s13059-024-03371-y","DOIUrl":"10.1186/s13059-024-03371-y","url":null,"abstract":"<p><p>Seqrutinator is an objective, flexible pipeline that removes sequences with sequencing and/or gene model errors and sequences from pseudogenes from complex, eukaryotic protein superfamilies. Testing Seqrutinator on major superfamilies BAHD, CYP, and UGT removes only 1.94% of SwissProt entries, 14% of entries from the model plant Arabidopsis thaliana, but 80% of entries from Pinus taeda's recent complete proteome. Application of Seqrutinator on crude BAHDomes, CYPomes, and UGTomes obtained from 16 plant proteomes shows convergence of the numbers of paralogues. MSAs, phylogenies, and particularly functional clustering improve drastically upon Seqrutinator application, indicating good performance.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"25 1","pages":"230"},"PeriodicalIF":12.3,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11346255/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142074283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-15DOI: 10.1186/s13059-024-03307-6
Wen-Jou Chang, Maria S Baker, Eleonora Laritsky, Chathura J Gunasekara, Uditha Maduranga, Justine C Galliou, Joseph W McFadden, Jessica R Waltemyer, Bruce Berggren-Thomas, Brianna N Tate, Hanxue Zhang, Benjamin D Rosen, Curtis P Van Tassell, George E Liu, Cristian Coarfa, Yi Athena Ren, Robert A Waterland
Background: We recently identified ~ 10,000 correlated regions of systemic interindividual epigenetic variation (CoRSIVs) in the human genome. These methylation variants are amenable to population studies, as DNA methylation measurements in blood provide information on epigenetic regulation throughout the body. Moreover, establishment of DNA methylation at human CoRSIVs is labile to periconceptional influences such as nutrition. Here, we analyze publicly available whole-genome bisulfite sequencing data on multiple tissues of each of two Holstein cows to determine whether CoRSIVs exist in cattle.
Results: Focusing on genomic blocks with ≥ 5 CpGs and a systemic interindividual variation index of at least 20, our approach identifies 217 cattle CoRSIVs, a subset of which we independently validate by bisulfite pyrosequencing. Similar to human CoRSIVs, those in cattle are strongly associated with genetic variation. Also as in humans, we show that establishment of DNA methylation at cattle CoRSIVs is particularly sensitive to early embryonic environment, in the context of embryo culture during assisted reproduction.
Conclusions: Our data indicate that CoRSIVs exist in cattle, as in humans, suggesting these systemic epigenetic variants may be common to mammals in general. To the extent that individual epigenetic variation at cattle CoRSIVs affects phenotypic outcomes, assessment of CoRSIV methylation at birth may become an important tool for optimizing agriculturally important traits. Moreover, adjusting embryo culture conditions during assisted reproduction may provide opportunities to tailor agricultural outcomes by engineering CoRSIV methylation profiles.
{"title":"Systemic interindividual DNA methylation variants in cattle share major hallmarks with those in humans.","authors":"Wen-Jou Chang, Maria S Baker, Eleonora Laritsky, Chathura J Gunasekara, Uditha Maduranga, Justine C Galliou, Joseph W McFadden, Jessica R Waltemyer, Bruce Berggren-Thomas, Brianna N Tate, Hanxue Zhang, Benjamin D Rosen, Curtis P Van Tassell, George E Liu, Cristian Coarfa, Yi Athena Ren, Robert A Waterland","doi":"10.1186/s13059-024-03307-6","DOIUrl":"10.1186/s13059-024-03307-6","url":null,"abstract":"<p><strong>Background: </strong>We recently identified ~ 10,000 correlated regions of systemic interindividual epigenetic variation (CoRSIVs) in the human genome. These methylation variants are amenable to population studies, as DNA methylation measurements in blood provide information on epigenetic regulation throughout the body. Moreover, establishment of DNA methylation at human CoRSIVs is labile to periconceptional influences such as nutrition. Here, we analyze publicly available whole-genome bisulfite sequencing data on multiple tissues of each of two Holstein cows to determine whether CoRSIVs exist in cattle.</p><p><strong>Results: </strong>Focusing on genomic blocks with ≥ 5 CpGs and a systemic interindividual variation index of at least 20, our approach identifies 217 cattle CoRSIVs, a subset of which we independently validate by bisulfite pyrosequencing. Similar to human CoRSIVs, those in cattle are strongly associated with genetic variation. Also as in humans, we show that establishment of DNA methylation at cattle CoRSIVs is particularly sensitive to early embryonic environment, in the context of embryo culture during assisted reproduction.</p><p><strong>Conclusions: </strong>Our data indicate that CoRSIVs exist in cattle, as in humans, suggesting these systemic epigenetic variants may be common to mammals in general. To the extent that individual epigenetic variation at cattle CoRSIVs affects phenotypic outcomes, assessment of CoRSIV methylation at birth may become an important tool for optimizing agriculturally important traits. Moreover, adjusting embryo culture conditions during assisted reproduction may provide opportunities to tailor agricultural outcomes by engineering CoRSIV methylation profiles.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"25 1","pages":"185"},"PeriodicalIF":12.3,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247883/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141617438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pathogenic allele silencing is a promising treatment for genetic hereditary diseases. Here, we develop an RNA-cleaving tool, TaqTth-hpRNA, consisting of a small, chimeric TaqTth, and a hairpin RNA guiding probe. With a minimal flanking sequence-motif requirement, in vitro and in vivo studies show TaqTth-hpRNA cleaves RNA efficiently and specifically. In an Alzheimer's disease model, we demonstrate silencing of mutant APPswe mRNA without altering the wild-type APP mRNA. Notably, due to the compact size of TaqTth, we are able to combine with APOE2 overexpression in a single AAV vector, which results in stronger inhibition of pathologies.
{"title":"TaqTth-hpRNA: a novel compact RNA-targeting tool for specific silencing of pathogenic mRNA.","authors":"Chong Xu, Jiyanuo Cao, Huanran Qiang, Yu Liu, Jialin Wu, Qiudan Luo, Meng Wan, Yujie Wang, Peiliang Wang, Qian Cheng, Guohua Zhou, Jian Sima, Yongjian Guo, Shu Xu","doi":"10.1186/s13059-024-03326-3","DOIUrl":"10.1186/s13059-024-03326-3","url":null,"abstract":"<p><p>Pathogenic allele silencing is a promising treatment for genetic hereditary diseases. Here, we develop an RNA-cleaving tool, TaqTth-hpRNA, consisting of a small, chimeric TaqTth, and a hairpin RNA guiding probe. With a minimal flanking sequence-motif requirement, in vitro and in vivo studies show TaqTth-hpRNA cleaves RNA efficiently and specifically. In an Alzheimer's disease model, we demonstrate silencing of mutant APP<sup>swe</sup> mRNA without altering the wild-type APP mRNA. Notably, due to the compact size of TaqTth, we are able to combine with APOE2 overexpression in a single AAV vector, which results in stronger inhibition of pathologies.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"25 1","pages":"179"},"PeriodicalIF":12.3,"publicationDate":"2024-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11229350/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141555787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}