Pub Date : 2025-08-18DOI: 10.1186/s13059-025-03707-2
Xavier Roca-Rada, Roberta Davidson, Matthew P Williams, Vanessa Villalba-Mouco, António Faustino Carvalho, Shyamsundar Ravishankar, Evelyn Collen, Christian Haarkötter, Leonard Taufik, Daniel R Cuesta-Aguirre, Catarina Tente, Álvaro M Monge Calleja, Rebecca Anne MacRoberts, Linda Melo, Gludhug A Purnomo, Yassine Souilmi, Raymond Tobler, Eugénia Cunha, Sofia Tereso, Vítor M J Matos, Teresa Matos Fernandes, Anne-France Maurer, Ana Maria Silva, Pedro C Carvalho, Bastien Llamas, João C Teixeira
Background: Recent ancient DNA studies uncovering large-scale demographic events in Iberia have presented very limited data for Portugal, a country located at the westernmost edge of continental Eurasia. Here, we present the most comprehensive collection of Portuguese ancient genome-wide data, from 67 individuals spanning 5000 years of human history, from the Neolithic to the nineteenth century.
Results: We identify early admixture between local hunter-gatherers and Anatolian-related farmers in Neolithic Portugal, with a northeastern-southwestern gradient of increasing Magdalenian-associated ancestry persistence in Iberia. This profile continues into the Chalcolithic, though Bell Beaker-associated sites reveal Portugal's first evidence of Steppe-related ancestry. Such ancestry has a broader demographic impact during the Bronze Age, despite continuity of local Chalcolithic genetic ancestry and limited Mediterranean connections. The village of Idanha-a-Velha emerges in the Roman period as a site of significant migration and interaction, presenting a notably diverse genetic profile that includes North African and Eastern Mediterranean ancestries. The Early Medieval period is marked by the arrival of Central European genetic diversity, likely linked to migrations of Germanic tribes, adding to coeval local, African, and Mediterranean influences. The Islamic and Christian Conquest periods show strong genetic continuity in northern Portugal and significant additional African admixture in the south. The latter remains stable during the post-Islamic period, suggesting enduring African influences.
Conclusions: We reveal dynamic patterns of migration in line with cultural exchange across millennia, but also the persistence of local ancestries. Our findings integrate genetic information with historical and archeological data, enhancing our understanding of Iberia's biological and cultural heritage.
{"title":"The genetic history of Portugal over the past 5,000 years.","authors":"Xavier Roca-Rada, Roberta Davidson, Matthew P Williams, Vanessa Villalba-Mouco, António Faustino Carvalho, Shyamsundar Ravishankar, Evelyn Collen, Christian Haarkötter, Leonard Taufik, Daniel R Cuesta-Aguirre, Catarina Tente, Álvaro M Monge Calleja, Rebecca Anne MacRoberts, Linda Melo, Gludhug A Purnomo, Yassine Souilmi, Raymond Tobler, Eugénia Cunha, Sofia Tereso, Vítor M J Matos, Teresa Matos Fernandes, Anne-France Maurer, Ana Maria Silva, Pedro C Carvalho, Bastien Llamas, João C Teixeira","doi":"10.1186/s13059-025-03707-2","DOIUrl":"10.1186/s13059-025-03707-2","url":null,"abstract":"<p><strong>Background: </strong>Recent ancient DNA studies uncovering large-scale demographic events in Iberia have presented very limited data for Portugal, a country located at the westernmost edge of continental Eurasia. Here, we present the most comprehensive collection of Portuguese ancient genome-wide data, from 67 individuals spanning 5000 years of human history, from the Neolithic to the nineteenth century.</p><p><strong>Results: </strong>We identify early admixture between local hunter-gatherers and Anatolian-related farmers in Neolithic Portugal, with a northeastern-southwestern gradient of increasing Magdalenian-associated ancestry persistence in Iberia. This profile continues into the Chalcolithic, though Bell Beaker-associated sites reveal Portugal's first evidence of Steppe-related ancestry. Such ancestry has a broader demographic impact during the Bronze Age, despite continuity of local Chalcolithic genetic ancestry and limited Mediterranean connections. The village of Idanha-a-Velha emerges in the Roman period as a site of significant migration and interaction, presenting a notably diverse genetic profile that includes North African and Eastern Mediterranean ancestries. The Early Medieval period is marked by the arrival of Central European genetic diversity, likely linked to migrations of Germanic tribes, adding to coeval local, African, and Mediterranean influences. The Islamic and Christian Conquest periods show strong genetic continuity in northern Portugal and significant additional African admixture in the south. The latter remains stable during the post-Islamic period, suggesting enduring African influences.</p><p><strong>Conclusions: </strong>We reveal dynamic patterns of migration in line with cultural exchange across millennia, but also the persistence of local ancestries. Our findings integrate genetic information with historical and archeological data, enhancing our understanding of Iberia's biological and cultural heritage.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"26 1","pages":"248"},"PeriodicalIF":12.3,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12360031/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144876354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-18DOI: 10.1186/s13059-025-03732-1
Thomas D Lewin, Isabel Jiah-Yih Liao, Yi-Jyun Luo
Species from diverse animal lineages have conserved groups of orthologous genes together on the same chromosome for over half a billion years since the last common ancestor of bilaterians. Although notable exceptions exist, the stability of chromosome-scale gene linkages has been proposed to be the norm among animals. Here we test this hypothesis across species from 52 bilaterian classes representing 15 different phyla. Contrary to expectations, we find that cases of genome structure conservation are rare, taxonomically restricted and unrepresentative of the general state of bilaterian genomes. Genome restructuring correlates with increased rates of protein sequence evolution and may be an underappreciated factor driving animal adaptation and diversification.
{"title":"Conservation of bilaterian genome structure is the exception, not the rule.","authors":"Thomas D Lewin, Isabel Jiah-Yih Liao, Yi-Jyun Luo","doi":"10.1186/s13059-025-03732-1","DOIUrl":"10.1186/s13059-025-03732-1","url":null,"abstract":"<p><p>Species from diverse animal lineages have conserved groups of orthologous genes together on the same chromosome for over half a billion years since the last common ancestor of bilaterians. Although notable exceptions exist, the stability of chromosome-scale gene linkages has been proposed to be the norm among animals. Here we test this hypothesis across species from 52 bilaterian classes representing 15 different phyla. Contrary to expectations, we find that cases of genome structure conservation are rare, taxonomically restricted and unrepresentative of the general state of bilaterian genomes. Genome restructuring correlates with increased rates of protein sequence evolution and may be an underappreciated factor driving animal adaptation and diversification.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"26 1","pages":"247"},"PeriodicalIF":12.3,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12359876/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144876350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The false discovery rate (FDR) controlling method by Benjamini and Hochberg (BH) is a popular choice in the omics fields. Here, we demonstrate that in datasets with a large degree of dependencies between features, FDR correction methods like BH can sometimes counter-intuitively report very high numbers of false positives, potentially misleading researchers. We call the attention of researchers to use suited multiple testing strategies and approaches like synthetic null data (negative control) to identify and minimize caveats related to false discoveries, as in the cases where false findings do occur, they may be numerous.
{"title":"Beware of counter-intuitive levels of false discoveries in datasets with strong intra-correlations.","authors":"Chakravarthi Kanduri, Maria Mamica, Emilie Willoch Olstad, Manuela Zucknick, Jingyi Jessica Li, Geir Kjetil Sandve","doi":"10.1186/s13059-025-03734-z","DOIUrl":"10.1186/s13059-025-03734-z","url":null,"abstract":"<p><p>The false discovery rate (FDR) controlling method by Benjamini and Hochberg (BH) is a popular choice in the omics fields. Here, we demonstrate that in datasets with a large degree of dependencies between features, FDR correction methods like BH can sometimes counter-intuitively report very high numbers of false positives, potentially misleading researchers. We call the attention of researchers to use suited multiple testing strategies and approaches like synthetic null data (negative control) to identify and minimize caveats related to false discoveries, as in the cases where false findings do occur, they may be numerous.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"26 1","pages":"249"},"PeriodicalIF":12.3,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12359981/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144876349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-18DOI: 10.1186/s13059-025-03708-1
Xiufei Chen, Jingfei Cheng, Linzhen Kong, Xiao Shu, Haiqi Xu, Masato Inoue, Marion Silvana Fernández-Berrocal, Dagny Sanden Døskeland, Magnar Bjørås, Shivan Sivakumar, Yibin Liu, Jing Ye, Chun-Xiao Song
We present direct sequencing methodologies, scTAPS for 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) and scCAPS + specifically for 5hmC, enabling quantitative detection of 5mC and 5hmC at single-base resolution and single-cell level. Achieving approximately 90% mapping efficiency, our plate-based methods accurately recover 5mC and 5hmC profiles in CD8 + T and mouse embryonic stem cells. Notably, scCAPS + reveals a global increase in 5hmC across neuronal and non-neuronal cells in the hippocampus of aging mice. Our methods offer strong potential for seamless integration into high-throughput single-cell multi-omics, facilitating future investigations of epigenomic dynamics in specific biological processes.
{"title":"Direct and bisulfite-free 5-methylcytosine and 5-hydroxymethylcytosine sequencing at single-cell resolution with scTAPS and scCAPS + .","authors":"Xiufei Chen, Jingfei Cheng, Linzhen Kong, Xiao Shu, Haiqi Xu, Masato Inoue, Marion Silvana Fernández-Berrocal, Dagny Sanden Døskeland, Magnar Bjørås, Shivan Sivakumar, Yibin Liu, Jing Ye, Chun-Xiao Song","doi":"10.1186/s13059-025-03708-1","DOIUrl":"10.1186/s13059-025-03708-1","url":null,"abstract":"<p><p>We present direct sequencing methodologies, scTAPS for 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) and scCAPS + specifically for 5hmC, enabling quantitative detection of 5mC and 5hmC at single-base resolution and single-cell level. Achieving approximately 90% mapping efficiency, our plate-based methods accurately recover 5mC and 5hmC profiles in CD8 + T and mouse embryonic stem cells. Notably, scCAPS + reveals a global increase in 5hmC across neuronal and non-neuronal cells in the hippocampus of aging mice. Our methods offer strong potential for seamless integration into high-throughput single-cell multi-omics, facilitating future investigations of epigenomic dynamics in specific biological processes.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"26 1","pages":"244"},"PeriodicalIF":12.3,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12359873/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144876351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-18DOI: 10.1186/s13059-025-03715-2
Marina Goliasse, Aurore Johary, Adrian E Platts, Fabian Ortner-Krause, Patrick P Edger, Jae Young Choi, Michael D Purugganan, Zoé Joly-Lopez
Background: Efforts to characterize regulatory elements in plant genomes traditionally rely on evolutionary conservation and chromatin accessibility. Recently, intergenic bi-directional nascent transcript has emerged as a putative hallmark of active enhancers. Here, we integrate these approaches to better define the cis-regulatory landscape of the rice genome.
Results: In juvenile leaf tissues of the Azucena rice variety, we analyze conserved noncoding sequences, intergenic bi-directional transcripts, and regions of open chromatin. These three features highlight distinct classes of regulatory targets, each exhibiting complexity and regulatory roles. Conserved noncoding sequences are associated with more complex regulatory interactions, while regions marked by chromatin accessibility or bi-directional nascent transcription tend to promote more stable regulatory activity. Some transcribed regulatory sites harbor elements linked to transposable element silencing, whereas others correlate with increased expression of nearby genes, pointing to candidate transcribed regulatory elements. We further identified molecular interactions between genic regions and intergenic transcribed regulatory elements using 3-dimensional chromatin contact data, we identify physical interactions between transcribed intergenic regions and genic regions. These interactions often co-localize with expression quantitative trait loci and coincide with increased transcription, further supporting a regulatory role.
Conclusions: Our integrative analysis reveals multiple distinct classes of regulatory elements in the rice genome, with overlapping but non-identical targets and signatures. Many candidate elements share features consistent with transcriptional enhancement, though the specific criteria for defining active enhancers in plants require further characterization. These findings underscore the importance of using complementary genomic signals to discover and characterize functionally diverse regulatory elements in plant genomes.
{"title":"Uncovering the multi-layer cis-regulatory landscape of rice via integrative nascent RNA analysis.","authors":"Marina Goliasse, Aurore Johary, Adrian E Platts, Fabian Ortner-Krause, Patrick P Edger, Jae Young Choi, Michael D Purugganan, Zoé Joly-Lopez","doi":"10.1186/s13059-025-03715-2","DOIUrl":"10.1186/s13059-025-03715-2","url":null,"abstract":"<p><strong>Background: </strong>Efforts to characterize regulatory elements in plant genomes traditionally rely on evolutionary conservation and chromatin accessibility. Recently, intergenic bi-directional nascent transcript has emerged as a putative hallmark of active enhancers. Here, we integrate these approaches to better define the cis-regulatory landscape of the rice genome.</p><p><strong>Results: </strong>In juvenile leaf tissues of the Azucena rice variety, we analyze conserved noncoding sequences, intergenic bi-directional transcripts, and regions of open chromatin. These three features highlight distinct classes of regulatory targets, each exhibiting complexity and regulatory roles. Conserved noncoding sequences are associated with more complex regulatory interactions, while regions marked by chromatin accessibility or bi-directional nascent transcription tend to promote more stable regulatory activity. Some transcribed regulatory sites harbor elements linked to transposable element silencing, whereas others correlate with increased expression of nearby genes, pointing to candidate transcribed regulatory elements. We further identified molecular interactions between genic regions and intergenic transcribed regulatory elements using 3-dimensional chromatin contact data, we identify physical interactions between transcribed intergenic regions and genic regions. These interactions often co-localize with expression quantitative trait loci and coincide with increased transcription, further supporting a regulatory role.</p><p><strong>Conclusions: </strong>Our integrative analysis reveals multiple distinct classes of regulatory elements in the rice genome, with overlapping but non-identical targets and signatures. Many candidate elements share features consistent with transcriptional enhancement, though the specific criteria for defining active enhancers in plants require further characterization. These findings underscore the importance of using complementary genomic signals to discover and characterize functionally diverse regulatory elements in plant genomes.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"26 1","pages":"250"},"PeriodicalIF":12.3,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12359857/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144876355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-18DOI: 10.1186/s13059-025-03721-4
Zhoujingpeng Wei, Guanhua Chen, Zheng-Zheng Tang
Standard protocols for meta-analysis of association studies are inadequate for microbiome data due to their complex compositional structure, leading to inaccurate and unstable microbial signature selection. To address this issue, we introduce Melody, a framework that generates, harmonizes, and combines study-specific summary association statistics to powerfully and robustly identify microbial signatures in meta-analysis. Comprehensive and realistic simulations demonstrate that Melody substantially outperforms existing approaches in prioritizing true signatures. In the meta-analyses of five studies on colorectal cancer and eight studies on the gut metabolome, we showcase the superior stability, reliability, and predictive performance of Melody-identified signatures.
{"title":"Melody: meta-analysis of microbiome association studies for discovering generalizable microbial signatures.","authors":"Zhoujingpeng Wei, Guanhua Chen, Zheng-Zheng Tang","doi":"10.1186/s13059-025-03721-4","DOIUrl":"10.1186/s13059-025-03721-4","url":null,"abstract":"<p><p>Standard protocols for meta-analysis of association studies are inadequate for microbiome data due to their complex compositional structure, leading to inaccurate and unstable microbial signature selection. To address this issue, we introduce Melody, a framework that generates, harmonizes, and combines study-specific summary association statistics to powerfully and robustly identify microbial signatures in meta-analysis. Comprehensive and realistic simulations demonstrate that Melody substantially outperforms existing approaches in prioritizing true signatures. In the meta-analyses of five studies on colorectal cancer and eight studies on the gut metabolome, we showcase the superior stability, reliability, and predictive performance of Melody-identified signatures.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"26 1","pages":"245"},"PeriodicalIF":12.3,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12359909/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144876352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-28DOI: 10.1186/s13059-024-03439-9
Thomas W Winkler, Simon Wiegrebe, Janina M Herold, Klaus J Stark, Helmut Küchenhoff, Iris M Heid
Background: Genome-wide association studies (GWAS) have identified thousands of loci for disease-related human traits in cross-sectional data. However, the impact of age on genetic effects is underacknowledged. Also, identifying genetic effects on longitudinal trait change has been hampered by small sample sizes for longitudinal data. Such effects on deteriorating trait levels over time or disease progression can be clinically relevant.
Results: Under certain assumptions, we demonstrate analytically that genetic-by-age interaction observed in cross-sectional data can be indicative of genetic association on longitudinal trait change. We propose a 2-stage approach with genome-wide pre-screening for genetic-by-age interaction in cross-sectional data and testing identified variants for longitudinal change in independent longitudinal data. Within UK Biobank cross-sectional data, we analyze 8 complex traits (up to 370,000 individuals). We identify 44 genetic-by-age interactions (7 loci for obesity traits, 26 for pulse pressure, few to none for lipids). Our cross-trait view reveals trait-specificity regarding the proportion of loci with age-modulated effects, which is particularly high for pulse pressure. Testing the 44 variants in longitudinal data (up to 50,000 individuals), we observe significant effects on change for obesity traits (near APOE, TMEM18, TFAP2B) and pulse pressure (near FBN1, IGFBP3; known for implication in arterial stiffness processes).
Conclusions: We provide analytical and empirical evidence that cross-sectional genetic-by-age interaction can help pinpoint longitudinal-change effects, when cross-sectional data surpasses longitudinal sample size. Our findings shed light on the distinction between traits that are impacted by age-dependent genetic effects and those that are not.
{"title":"Genetic-by-age interaction analyses on complex traits in UK Biobank and their potential to identify effects on longitudinal trait change.","authors":"Thomas W Winkler, Simon Wiegrebe, Janina M Herold, Klaus J Stark, Helmut Küchenhoff, Iris M Heid","doi":"10.1186/s13059-024-03439-9","DOIUrl":"10.1186/s13059-024-03439-9","url":null,"abstract":"<p><strong>Background: </strong>Genome-wide association studies (GWAS) have identified thousands of loci for disease-related human traits in cross-sectional data. However, the impact of age on genetic effects is underacknowledged. Also, identifying genetic effects on longitudinal trait change has been hampered by small sample sizes for longitudinal data. Such effects on deteriorating trait levels over time or disease progression can be clinically relevant.</p><p><strong>Results: </strong>Under certain assumptions, we demonstrate analytically that genetic-by-age interaction observed in cross-sectional data can be indicative of genetic association on longitudinal trait change. We propose a 2-stage approach with genome-wide pre-screening for genetic-by-age interaction in cross-sectional data and testing identified variants for longitudinal change in independent longitudinal data. Within UK Biobank cross-sectional data, we analyze 8 complex traits (up to 370,000 individuals). We identify 44 genetic-by-age interactions (7 loci for obesity traits, 26 for pulse pressure, few to none for lipids). Our cross-trait view reveals trait-specificity regarding the proportion of loci with age-modulated effects, which is particularly high for pulse pressure. Testing the 44 variants in longitudinal data (up to 50,000 individuals), we observe significant effects on change for obesity traits (near APOE, TMEM18, TFAP2B) and pulse pressure (near FBN1, IGFBP3; known for implication in arterial stiffness processes).</p><p><strong>Conclusions: </strong>We provide analytical and empirical evidence that cross-sectional genetic-by-age interaction can help pinpoint longitudinal-change effects, when cross-sectional data surpasses longitudinal sample size. Our findings shed light on the distinction between traits that are impacted by age-dependent genetic effects and those that are not.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"25 1","pages":"300"},"PeriodicalIF":12.3,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11606088/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142752191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-14DOI: 10.1186/s13059-024-03432-2
Tianyu Yuan, Hao Yan, Kevin C Li, Ivan Surovtsev, Megan C King, Simon G J Mochrie
Background: Inhomogeneous patterns of chromatin-chromatin contacts within 10-100-kb-sized regions of the genome are a generic feature of chromatin spatial organization. These features, termed topologically associating domains (TADs), have led to the loop extrusion factor (LEF) model. Currently, our ability to model TADs relies on the observation that in vertebrates TAD boundaries are correlated with DNA sequences that bind CTCF, which therefore is inferred to block loop extrusion. However, although TADs feature prominently in their Hi-C maps, non-vertebrate eukaryotes either do not express CTCF or show few TAD boundaries that correlate with CTCF sites. In all of these organisms, the counterparts of CTCF remain unknown, frustrating comparisons between Hi-C data and simulations.
Results: To extend the LEF model across the tree of life, here, we propose the conserved-current loop extrusion (CCLE) model that interprets loop-extruding cohesin as a nearly conserved probability current. From cohesin ChIP-seq data alone, we derive a position-dependent loop extrusion rate, allowing for a modified paradigm for loop extrusion, that goes beyond solely localized barriers to also include loop extrusion rates that vary continuously. We show that CCLE accurately predicts the TAD-scale Hi-C maps of interphase Schizosaccharomyces pombe, as well as those of meiotic and mitotic Saccharomyces cerevisiae, demonstrating its utility in organisms lacking CTCF.
Conclusions: The success of CCLE in yeasts suggests that loop extrusion by cohesin is indeed the primary mechanism underlying TADs in these systems. CCLE allows us to obtain loop extrusion parameters such as the LEF density and processivity, which compare well to independent estimates.
背景:基因组中 10-100 kb 大小区域内染色质与染色质接触的不均匀模式是染色质空间组织的一般特征。这些特征被称为拓扑关联域(TADs),并由此产生了环挤出因子(LEF)模型。目前,我们建立 TADs 模型的能力依赖于观察到的现象,即在脊椎动物中,TAD 的边界与结合 CTCF 的 DNA 序列相关,因此推断 CTCF 会阻止环挤压。然而,尽管 TAD 在 Hi-C 图谱中具有显著特征,但非脊椎动物真核生物要么不表达 CTCF,要么很少显示与 CTCF 位点相关的 TAD 边界。在所有这些生物中,CTCF 的对应物仍然未知,这使得 Hi-C 数据与模拟结果之间的比较变得困难:为了将 LEF 模型扩展到整个生命树,我们在此提出了保守电流环挤出(CCLE)模型,该模型将环挤出的凝聚蛋白解释为几乎保守的概率电流。仅从凝聚素 ChIP-seq 数据中,我们就得出了与位置相关的环路挤出率,从而修正了环路挤出的范式,超越了单纯的局部障碍,也包括了连续变化的环路挤出率。我们的研究表明,CCLE 准确预测了间期酵母的 TAD 尺度 Hi-C 图谱以及减数分裂和有丝分裂酵母的 TAD 尺度 Hi-C 图谱,证明了它在缺乏 CTCF 的生物体中的实用性:结论:CCLE 在酵母中的成功应用表明,在这些系统中,凝聚素的环挤压确实是 TAD 的主要机制。CCLE使我们能够获得环挤压参数,如LEF密度和加工率,这些参数与独立的估计值比较接近。
{"title":"Cohesin distribution alone predicts chromatin organization in yeast via conserved-current loop extrusion.","authors":"Tianyu Yuan, Hao Yan, Kevin C Li, Ivan Surovtsev, Megan C King, Simon G J Mochrie","doi":"10.1186/s13059-024-03432-2","DOIUrl":"10.1186/s13059-024-03432-2","url":null,"abstract":"<p><strong>Background: </strong>Inhomogeneous patterns of chromatin-chromatin contacts within 10-100-kb-sized regions of the genome are a generic feature of chromatin spatial organization. These features, termed topologically associating domains (TADs), have led to the loop extrusion factor (LEF) model. Currently, our ability to model TADs relies on the observation that in vertebrates TAD boundaries are correlated with DNA sequences that bind CTCF, which therefore is inferred to block loop extrusion. However, although TADs feature prominently in their Hi-C maps, non-vertebrate eukaryotes either do not express CTCF or show few TAD boundaries that correlate with CTCF sites. In all of these organisms, the counterparts of CTCF remain unknown, frustrating comparisons between Hi-C data and simulations.</p><p><strong>Results: </strong>To extend the LEF model across the tree of life, here, we propose the conserved-current loop extrusion (CCLE) model that interprets loop-extruding cohesin as a nearly conserved probability current. From cohesin ChIP-seq data alone, we derive a position-dependent loop extrusion rate, allowing for a modified paradigm for loop extrusion, that goes beyond solely localized barriers to also include loop extrusion rates that vary continuously. We show that CCLE accurately predicts the TAD-scale Hi-C maps of interphase Schizosaccharomyces pombe, as well as those of meiotic and mitotic Saccharomyces cerevisiae, demonstrating its utility in organisms lacking CTCF.</p><p><strong>Conclusions: </strong>The success of CCLE in yeasts suggests that loop extrusion by cohesin is indeed the primary mechanism underlying TADs in these systems. CCLE allows us to obtain loop extrusion parameters such as the LEF density and processivity, which compare well to independent estimates.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"25 1","pages":"293"},"PeriodicalIF":12.3,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11566905/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142630799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Messenger RNA splicing and degradation are critical for gene expression regulation, the abnormality of which leads to diseases. Previous methods for estimating kinetic rates have limitations, assuming uniform rates across cells. DeepKINET is a deep generative model that estimates splicing and degradation rates at single-cell resolution from scRNA-seq data. DeepKINET outperforms existing methods on simulated and metabolic labeling datasets. Applied to forebrain and breast cancer data, it identifies RNA-binding proteins responsible for kinetic rate diversity. DeepKINET also analyzes the effects of splicing factor mutations on target genes in erythroid lineage cells. DeepKINET effectively reveals cellular heterogeneity in post-transcriptional regulation.
{"title":"DeepKINET: a deep generative model for estimating single-cell RNA splicing and degradation rates.","authors":"Chikara Mizukoshi, Yasuhiro Kojima, Satoshi Nomura, Shuto Hayashi, Ko Abe, Teppei Shimamura","doi":"10.1186/s13059-024-03367-8","DOIUrl":"10.1186/s13059-024-03367-8","url":null,"abstract":"<p><p>Messenger RNA splicing and degradation are critical for gene expression regulation, the abnormality of which leads to diseases. Previous methods for estimating kinetic rates have limitations, assuming uniform rates across cells. DeepKINET is a deep generative model that estimates splicing and degradation rates at single-cell resolution from scRNA-seq data. DeepKINET outperforms existing methods on simulated and metabolic labeling datasets. Applied to forebrain and breast cancer data, it identifies RNA-binding proteins responsible for kinetic rate diversity. DeepKINET also analyzes the effects of splicing factor mutations on target genes in erythroid lineage cells. DeepKINET effectively reveals cellular heterogeneity in post-transcriptional regulation.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"25 1","pages":"229"},"PeriodicalIF":12.3,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11378460/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142141562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-26DOI: 10.1186/s13059-024-03371-y
Agustín Amalfitano, Nicolás Stocchi, Hugo Marcelo Atencio, Fernando Villarreal, Arjen Ten Have
Seqrutinator is an objective, flexible pipeline that removes sequences with sequencing and/or gene model errors and sequences from pseudogenes from complex, eukaryotic protein superfamilies. Testing Seqrutinator on major superfamilies BAHD, CYP, and UGT removes only 1.94% of SwissProt entries, 14% of entries from the model plant Arabidopsis thaliana, but 80% of entries from Pinus taeda's recent complete proteome. Application of Seqrutinator on crude BAHDomes, CYPomes, and UGTomes obtained from 16 plant proteomes shows convergence of the numbers of paralogues. MSAs, phylogenies, and particularly functional clustering improve drastically upon Seqrutinator application, indicating good performance.
{"title":"Seqrutinator: scrutiny of large protein superfamily sequence datasets for the identification and elimination of non-functional homologues.","authors":"Agustín Amalfitano, Nicolás Stocchi, Hugo Marcelo Atencio, Fernando Villarreal, Arjen Ten Have","doi":"10.1186/s13059-024-03371-y","DOIUrl":"10.1186/s13059-024-03371-y","url":null,"abstract":"<p><p>Seqrutinator is an objective, flexible pipeline that removes sequences with sequencing and/or gene model errors and sequences from pseudogenes from complex, eukaryotic protein superfamilies. Testing Seqrutinator on major superfamilies BAHD, CYP, and UGT removes only 1.94% of SwissProt entries, 14% of entries from the model plant Arabidopsis thaliana, but 80% of entries from Pinus taeda's recent complete proteome. Application of Seqrutinator on crude BAHDomes, CYPomes, and UGTomes obtained from 16 plant proteomes shows convergence of the numbers of paralogues. MSAs, phylogenies, and particularly functional clustering improve drastically upon Seqrutinator application, indicating good performance.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"25 1","pages":"230"},"PeriodicalIF":12.3,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11346255/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142074283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}