Paired-class homeodomain transcription factors (HD TFs) play essential roles in vertebrate development, and their mutations are linked to human diseases. One unique feature of paired-class HD is cooperative dimerization on specific palindrome DNA sequences. Yet, the functional significance of HD cooperative dimerization in animal development and its dysregulation in diseases remain elusive. Using the retinal TF Cone-rod Homeobox (CRX) as a model, we have studied how blindness-causing mutations in the paired HD, p.E80A and p.K88N, alter CRX’s cooperative dimerization, lead to gene misexpression and photoreceptor developmental deficits in dominant manners. CRXE80A maintains binding at monomeric WT CRX motifs but is deficient in cooperative binding at dimeric motifs. CRXE80A’s cooperativity defect impacts the exponential increase of photoreceptor gene expression in terminal differentiation and produces immature, non-functional photoreceptors in the CrxE80A retinas. CRXK88N is highly cooperative and localizes to ectopic genomic sites with strong enrichment of dimeric HD motifs. CRXK88N’s altered biochemical properties disrupt CRX’s ability to direct dynamic chromatin remodeling during development to activate photoreceptor differentiation programs and silence progenitor programs. Our study here provides in vitro and in vivo molecular evidence that paired-class HD cooperative dimerization regulates neuronal development and dysregulation of cooperative binding contributes to severe dominant blinding retinopathies.
{"title":"Aberrant homeodomain-DNA cooperative dimerization underlies distinct developmental defects in two dominant CRX retinopathy models","authors":"Yiqiao Zheng, Gary D. Stormo, Shiming Chen","doi":"10.1101/gr.279340.124","DOIUrl":"https://doi.org/10.1101/gr.279340.124","url":null,"abstract":"Paired-class homeodomain transcription factors (HD TFs) play essential roles in vertebrate development, and their mutations are linked to human diseases. One unique feature of paired-class HD is cooperative dimerization on specific palindrome DNA sequences. Yet, the functional significance of HD cooperative dimerization in animal development and its dysregulation in diseases remain elusive. Using the retinal TF Cone-rod Homeobox (CRX) as a model, we have studied how blindness-causing mutations in the paired HD, p.E80A and p.K88N, alter CRX’s cooperative dimerization, lead to gene misexpression and photoreceptor developmental deficits in dominant manners. CRX<sup>E80A</sup> maintains binding at monomeric WT CRX motifs but is deficient in cooperative binding at dimeric motifs. CRX<sup>E80A</sup>’s cooperativity defect impacts the exponential increase of photoreceptor gene expression in terminal differentiation and produces immature, non-functional photoreceptors in the <em>Crx<sup>E80A</sup></em> retinas. CRX<sup>K88N</sup> is highly cooperative and localizes to ectopic genomic sites with strong enrichment of dimeric HD motifs. CRX<sup>K88N</sup>’s altered biochemical properties disrupt CRX’s ability to direct dynamic chromatin remodeling during development to activate photoreceptor differentiation programs and silence progenitor programs. Our study here provides <em>in vitro</em> and <em>in vivo</em> molecular evidence that paired-class HD cooperative dimerization regulates neuronal development and dysregulation of cooperative binding contributes to severe dominant blinding retinopathies.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"13 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142879927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Song Zhang, Chao Wang, Shenghua Qin, Choulin Chen, Yongzhou Bao, Yuanyuan Zhang, Lingna Xu, Qingyou Liu, Yunxiang Zhao, Kui Li, Zhonglin Tang, Yuwen Liu
Super-enhancers (SEs) govern the expression of genes defining cell identity. However, the dynamic landscape of SEs and their critical constituent enhancers involved in skeletal muscle development remains unclear. In this study, using pig as a model, we employed cleavage under targets and tagmentation (CUT&Tag) to profile the enhancer-associated histone modification marker H3K27ac in skeletal muscle across two prenatal and three postnatal stages, and investigated how SEs influence skeletal muscle development. We identify three SE families with distinct temporal dynamics: continuous (Con, 397), transient (TS, 434), and de novo (DN, 756). These SE families are associated with different temporal gene expression trajectories, biological functions, and DNA methylation levels. Notably, several lines of evidence suggest a potential prominent role of Con SEs in regulating porcine muscle development and meat traits. To pinpoint key cis-regulatory units in Con SEs, we developed an integrative approach that leverages information from eRNA annotation, genome-wide association study (GWAS) signals, and high-throughput capture self-transcribing active regulatory region sequencing (STARR-seq) experiments. Within Con SEs, we identify 20 candidate critical enhancers with meat and carcass-associated DNA variations that affect enhancer activity, and infer their upstream transcription factors and downstream target genes. As a proof of concept, we experimentally validate the role of one such enhancer and its potential target gene during myogenesis. Our findings reveal the dynamic regulatory features of SEs in skeletal muscle development and provide a general integrative framework for identifying critical enhancers underlying the formation of complex traits.
超级增强子(SE)控制着决定细胞特性的基因的表达。然而,参与骨骼肌发育的超级增强子及其关键组成增强子的动态图谱仍不清楚。在这项研究中,我们以猪为模型,利用 CUT&Tag 分析了骨骼肌中与增强子相关的组蛋白修饰标记 H3K27ac 在出生前两个阶段和出生后三个阶段的变化,并研究了增强子如何影响骨骼肌的发育。我们发现了三个具有不同时间动态的 SE 家族:连续 SE(Con,397 个)、瞬时 SE(TS,434 个)和新生 SE(DN,756 个)。这些 SE 家族与不同时间的基因表达轨迹、生物功能和 DNA 甲基化水平相关。值得注意的是,一些证据表明,Con SEs 在调节猪肌肉发育和肉质性状方面可能起着重要作用。为了精确定位 Con SEs 中的关键顺式调控单元,我们开发了一种综合方法,利用来自 eRNA 注释、GWAS 信号和高通量捕获 STARR-seq 实验的信息。在 Con SEs 中,我们发现了 20 个候选关键增强子,它们与肉类和胴体相关的 DNA 变异会影响增强子的活性,并推断出了它们的上游 TF 和下游靶基因。作为概念验证,我们通过实验验证了其中一个增强子及其潜在靶基因在肌形成过程中的作用。我们的研究结果揭示了骨骼肌发育过程中增强子的动态调控特征,并为确定复杂性状形成过程中的关键增强子提供了一个通用的综合框架。
{"title":"Analyzing super-enhancer temporal dynamics reveals potential critical enhancers and their gene regulatory networks underlying skeletal muscle development.","authors":"Song Zhang, Chao Wang, Shenghua Qin, Choulin Chen, Yongzhou Bao, Yuanyuan Zhang, Lingna Xu, Qingyou Liu, Yunxiang Zhao, Kui Li, Zhonglin Tang, Yuwen Liu","doi":"10.1101/gr.278344.123","DOIUrl":"10.1101/gr.278344.123","url":null,"abstract":"<p><p>Super-enhancers (SEs) govern the expression of genes defining cell identity. However, the dynamic landscape of SEs and their critical constituent enhancers involved in skeletal muscle development remains unclear. In this study, using pig as a model, we employed cleavage under targets and tagmentation (CUT&Tag) to profile the enhancer-associated histone modification marker H3K27ac in skeletal muscle across two prenatal and three postnatal stages, and investigated how SEs influence skeletal muscle development. We identify three SE families with distinct temporal dynamics: continuous (Con, 397), transient (TS, 434), and de novo (DN, 756). These SE families are associated with different temporal gene expression trajectories, biological functions, and DNA methylation levels. Notably, several lines of evidence suggest a potential prominent role of Con SEs in regulating porcine muscle development and meat traits. To pinpoint key <i>cis</i>-regulatory units in Con SEs, we developed an integrative approach that leverages information from eRNA annotation, genome-wide association study (GWAS) signals, and high-throughput capture self-transcribing active regulatory region sequencing (STARR-seq) experiments. Within Con SEs, we identify 20 candidate critical enhancers with meat and carcass-associated DNA variations that affect enhancer activity, and infer their upstream transcription factors and downstream target genes. As a proof of concept, we experimentally validate the role of one such enhancer and its potential target gene during myogenesis. Our findings reveal the dynamic regulatory features of SEs in skeletal muscle development and provide a general integrative framework for identifying critical enhancers underlying the formation of complex traits.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"2190-2202"},"PeriodicalIF":6.2,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11694746/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142463175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alejandro Paniagua, Cristina Agustin-García, Francisco J Pardo-Palacios, Thomas Brown, Maite De Maria, Nancy D Denslow, Camila Mazzoni, Ana Conesa
While the production of a draft genome has become more accessible due to long-read sequencing, the annotation of these new genomes has not been developed at the same pace. Long-read RNA sequencing (lrRNA-seq) offers a promising solution for enhancing gene annotation. In this study, we explore how sequencing platforms, Oxford Nanopore R9.4.1 chemistry or PacBio Sequel II CCS, and data processing methods influence evidence-driven genome annotation using long reads. Incorporating PacBio transcripts into our annotation pipeline significantly outperformed traditional methods, such as ab initio predictions and short-read-based annotations. We applied this strategy to a nonmodel species, the Florida manatee, and compared our results to existing short-read-based annotation. At the loci level, both annotations were highly concordant, with 90% agreement. However, at the transcript level, the agreement was only 35%. We identified 4,906 novel loci, represented by 5,707 isoforms, with 64% of these isoforms matching known sequences in other mammalian species. Overall, our findings underscore the importance of using high-quality curated transcript models in combination with ab initio methods for effective genome annotation.
虽然由于长读测序,草图基因组的制作变得更加容易,但这些新基因组的注释并没有以同样的速度发展。长读RNA测序(lrRNA-seq)为增强基因注释提供了一种很有前途的解决方案。在本研究中,我们探讨了测序平台、Oxford Nanopore R9.4.1化学或PacBio Sequel II CCS以及数据处理方法如何影响使用长读取的证据驱动基因组注释。将PacBio转录本整合到我们的注释管道中显著优于传统方法,例如从头开始预测和基于短读的注释。我们将这种策略应用于非模式物种佛罗里达海牛,并将我们的结果与现有的基于短读的注释进行比较。在位点水平上,两种注释高度一致,一致性达90%。然而,在成绩单水平上,一致性只有35%。我们鉴定了4906个新位点,由5707个同种异构体代表,其中64%的同种异构体与其他哺乳动物物种的已知序列相匹配。总的来说,我们的研究结果强调了将高质量的转录本模型与从头算方法相结合用于有效基因组注释的重要性。
{"title":"Evaluation of strategies for evidence-driven genome annotation using long-read RNA-seq","authors":"Alejandro Paniagua, Cristina Agustin-García, Francisco J Pardo-Palacios, Thomas Brown, Maite De Maria, Nancy D Denslow, Camila Mazzoni, Ana Conesa","doi":"10.1101/gr.279864.124","DOIUrl":"https://doi.org/10.1101/gr.279864.124","url":null,"abstract":"While the production of a draft genome has become more accessible due to long-read sequencing, the annotation of these new genomes has not been developed at the same pace. Long-read RNA sequencing (lrRNA-seq) offers a promising solution for enhancing gene annotation. In this study, we explore how sequencing platforms, Oxford Nanopore R9.4.1 chemistry or PacBio Sequel II CCS, and data processing methods influence evidence-driven genome annotation using long reads. Incorporating PacBio transcripts into our annotation pipeline significantly outperformed traditional methods, such as ab initio predictions and short-read-based annotations. We applied this strategy to a nonmodel species, the Florida manatee, and compared our results to existing short-read-based annotation. At the loci level, both annotations were highly concordant, with 90% agreement. However, at the transcript level, the agreement was only 35%. We identified 4,906 novel loci, represented by 5,707 isoforms, with 64% of these isoforms matching known sequences in other mammalian species. Overall, our findings underscore the importance of using high-quality curated transcript models in combination with ab initio methods for effective genome annotation.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"32 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142879924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Simon Orozco-Arias, Pío Sierra, Richard Durbin, Josefa González
The number of species with high-quality genome sequences continues to increase, in part due to the scaling up of multiple large-scale biodiversity sequencing projects. While the need to annotate genic sequences in these genomes is widely acknowledged, the parallel need to annotate transposable element (TE) sequences that have been shown to alter genome architecture, rewire gene regulatory networks, and contribute to the evolution of host traits is becoming ever more evident. However, accurate genome-wide annotation of TE sequences is still technically challenging. Several de novo TE identification tools are now available, but manual curation of the libraries produced by these tools is needed to generate high-quality genome annotations. Manual curation is time-consuming, and thus impractical for large-scale genomic studies, and lacks reproducibility. In this work, we present the Manual Curator Helper tool MCHelper, which automates the TE library curation process. By leveraging MCHelper's fully automated mode with the outputs from three de novo TE identification tools, RepeatModeler2, EDTA, and REPET, in the fruit fly, rice, hooded crow, zebrafish, maize, and human, we show a substantial improvement in the quality of the TE libraries and genome annotations. MCHelper libraries are less redundant, with up to 65% reduction in the number of consensus sequences, have up to 11.4% fewer false positive sequences, and up to ∼48% fewer “unclassified/unknown” TE consensus sequences. Genome-wide TE annotations are also improved, including larger unfragmented insertions. Moreover, MCHelper is an easy-to-install and easy-to-use tool.
{"title":"MCHelper automatically curates transposable element libraries across eukaryotic species","authors":"Simon Orozco-Arias, Pío Sierra, Richard Durbin, Josefa González","doi":"10.1101/gr.278821.123","DOIUrl":"https://doi.org/10.1101/gr.278821.123","url":null,"abstract":"The number of species with high-quality genome sequences continues to increase, in part due to the scaling up of multiple large-scale biodiversity sequencing projects. While the need to annotate genic sequences in these genomes is widely acknowledged, the parallel need to annotate transposable element (TE) sequences that have been shown to alter genome architecture, rewire gene regulatory networks, and contribute to the evolution of host traits is becoming ever more evident. However, accurate genome-wide annotation of TE sequences is still technically challenging. Several de novo TE identification tools are now available, but manual curation of the libraries produced by these tools is needed to generate high-quality genome annotations. Manual curation is time-consuming, and thus impractical for large-scale genomic studies, and lacks reproducibility. In this work, we present the Manual Curator Helper tool MCHelper, which automates the TE library curation process. By leveraging MCHelper's fully automated mode with the outputs from three de novo TE identification tools, RepeatModeler2, EDTA, and REPET, in the fruit fly, rice, hooded crow, zebrafish, maize, and human, we show a substantial improvement in the quality of the TE libraries and genome annotations. MCHelper libraries are less redundant, with up to 65% reduction in the number of consensus sequences, have up to 11.4% fewer false positive sequences, and up to ∼48% fewer “unclassified/unknown” TE consensus sequences. Genome-wide TE annotations are also improved, including larger unfragmented insertions. Moreover, MCHelper is an easy-to-install and easy-to-use tool.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"20 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142797141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stephanie C. Bohaczuk, Zachary J. Amador, Chang Li, Benjamin J. Mallory, Elliott G. Swanson, Jane Ranchalis, Mitchell R. Vollger, Katherine M. Munson, Tom Walsh, Morgan O. Hamm, Yizi Mao, Andre Lieber, Andrew B. Stergachis
Accurately quantifying the functional consequences of noncoding mosaic variants requires the pairing of DNA sequences with both accessible and closed chromatin architectures along individual DNA molecules—a pairing that cannot be achieved using traditional fragmentation-based chromatin assays. We demonstrate that targeted single-molecule chromatin fiber sequencing (Fiber-seq) achieves this, permitting single-molecule, long-read genomic, and epigenomic profiling across targeted >100 kb loci with ∼10-fold enrichment over untargeted sequencing. Targeted Fiber-seq reveals that pathogenic expansions of the DMPK CTG repeat that underlie Myotonic Dystrophy 1 are characterized by somatic instability and disruption of multiple nearby regulatory elements, both of which are repeat length-dependent. Furthermore, we reveal that therapeutic adenine base editing of the segmentally duplicated γ-globin (HBG1/HBG2) promoters in primary human hematopoietic cells induced toward an erythroblast lineage increases the accessibility of the HBG1 promoter as well as neighboring regulatory elements. Overall, we find that these non–protein coding mosaic variants can have complex impacts on chromatin architectures, including extending beyond the regulatory element harboring the variant.
{"title":"Resolving the chromatin impact of mosaic variants with targeted Fiber-seq","authors":"Stephanie C. Bohaczuk, Zachary J. Amador, Chang Li, Benjamin J. Mallory, Elliott G. Swanson, Jane Ranchalis, Mitchell R. Vollger, Katherine M. Munson, Tom Walsh, Morgan O. Hamm, Yizi Mao, Andre Lieber, Andrew B. Stergachis","doi":"10.1101/gr.279747.124","DOIUrl":"https://doi.org/10.1101/gr.279747.124","url":null,"abstract":"Accurately quantifying the functional consequences of noncoding mosaic variants requires the pairing of DNA sequences with both accessible and closed chromatin architectures along individual DNA molecules—a pairing that cannot be achieved using traditional fragmentation-based chromatin assays. We demonstrate that targeted single-molecule chromatin fiber sequencing (Fiber-seq) achieves this, permitting single-molecule, long-read genomic, and epigenomic profiling across targeted >100 kb loci with ∼10-fold enrichment over untargeted sequencing. Targeted Fiber-seq reveals that pathogenic expansions of the <em>DMPK</em> CTG repeat that underlie Myotonic Dystrophy 1 are characterized by somatic instability and disruption of multiple nearby regulatory elements, both of which are repeat length-dependent. Furthermore, we reveal that therapeutic adenine base editing of the segmentally duplicated γ-globin (<em>HBG1</em>/<em>HBG2</em>) promoters in primary human hematopoietic cells induced toward an erythroblast lineage increases the accessibility of the <em>HBG1</em> promoter as well as neighboring regulatory elements. Overall, we find that these non–protein coding mosaic variants can have complex impacts on chromatin architectures, including extending beyond the regulatory element harboring the variant.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"1 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142797142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vanessa L. Porter, Michelle Ng, Kieran O'Neill, Signe MacLennan, Richard D. Corbett, Luka Culibrk, Zeid Hamadeh, Marissa Iden, Rachel Schmidt, Shirng-Wern Tsaih, Carolyn Nakisige, Martin Origa, Jackson Orem, Glenn Chang, Jeremy Fan, Ka Ming Nip, Vahid Akbari, Simon K. Chan, James Hopkins, Richard A. Moore, Eric Chuah, Karen L. Mungall, Andrew J. Mungall, Inanc Birol, Steven J.M. Jones, Janet S. Rader, Marco A. Marra
Human papillomavirus (HPV) integration has been implicated in transforming HPV infection into cancer. To resolve genome dysregulation associated with HPV integration, we performed Oxford Nanopore long-read sequencing on 72 cervical cancer genomes from an Ugandan dataset that was previously characterized using short-read sequencing. We found recurrent structural rearrangement patterns at HPV integration events, which we categorized as: del(etion)-like, dup(lication)-like, translocation, multibreakpoint, or repeat region integrations. Integrations involving amplified HPV-human concatemers, particularly multibreakpoint events, frequently harbored heterogeneous forms and copy numbers of the viral genome. Transcriptionally active integrants were characterized by unmethylated regions in both the viral and human genomes downstream from the viral transcription start site, resulting in HPV-human fusion transcripts. In contrast, integrants without evidence of expression lacked consistent methylation patterns. Furthermore, whereas transcriptional dysregulation was limited to genes within 200 kilobases of an HPV integrant, dysregulation of the human epigenome in the form of allelic differentially methylated regions affected megabase expanses of the genome, irrespective of the integrant's transcriptional status. By elucidating the structural, epigenetic, and allele-specific impacts of HPV integration, we provide insight into the role of integrated HPV in cervical cancer.
{"title":"Rearrangements of viral and human genomes at human papillomavirus integration events and their allele-specific impacts on cancer genome regulation","authors":"Vanessa L. Porter, Michelle Ng, Kieran O'Neill, Signe MacLennan, Richard D. Corbett, Luka Culibrk, Zeid Hamadeh, Marissa Iden, Rachel Schmidt, Shirng-Wern Tsaih, Carolyn Nakisige, Martin Origa, Jackson Orem, Glenn Chang, Jeremy Fan, Ka Ming Nip, Vahid Akbari, Simon K. Chan, James Hopkins, Richard A. Moore, Eric Chuah, Karen L. Mungall, Andrew J. Mungall, Inanc Birol, Steven J.M. Jones, Janet S. Rader, Marco A. Marra","doi":"10.1101/gr.279041.124","DOIUrl":"https://doi.org/10.1101/gr.279041.124","url":null,"abstract":"Human papillomavirus (HPV) integration has been implicated in transforming HPV infection into cancer. To resolve genome dysregulation associated with HPV integration, we performed Oxford Nanopore long-read sequencing on 72 cervical cancer genomes from an Ugandan dataset that was previously characterized using short-read sequencing. We found recurrent structural rearrangement patterns at HPV integration events, which we categorized as: del(etion)-like, dup(lication)-like, translocation, multibreakpoint, or repeat region integrations. Integrations involving amplified HPV-human concatemers, particularly multibreakpoint events, frequently harbored heterogeneous forms and copy numbers of the viral genome. Transcriptionally active integrants were characterized by unmethylated regions in both the viral and human genomes downstream from the viral transcription start site, resulting in HPV-human fusion transcripts. In contrast, integrants without evidence of expression lacked consistent methylation patterns. Furthermore, whereas transcriptional dysregulation was limited to genes within 200 kilobases of an HPV integrant, dysregulation of the human epigenome in the form of allelic differentially methylated regions affected megabase expanses of the genome, irrespective of the integrant's transcriptional status. By elucidating the structural, epigenetic, and allele-specific impacts of HPV integration, we provide insight into the role of integrated HPV in cervical cancer.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"68 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142783300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chong Li, Marc Jan Bonder, Sabriya Syed, Matthew Jensen, Human Genome Structural Variation Consortium (HGSVC), HGSVC Functional Analysis Working Group, Mark B. Gerstein, Michael C. Zody, Mark J.P. Chaisson, Michael E. Talkowski, Tobias Marschall, Jan O. Korbel, Evan E. Eichler, Charles Lee, Xinghua Shi
The human genome is packaged within a three-dimensional (3D) nucleus and organized into structural units known as compartments, topologically associating domains (TADs), and loops. TAD boundaries, separating adjacent TADs, have been found to be well conserved across mammalian species and more evolutionarily constrained than TADs themselves. Recent studies show that structural variants (SVs) can modify 3D genomes through the disruption of TADs, which play an essential role in insulating genes from outside regulatory elements’ aberrant regulation. However, how SV affects the 3D genome structure and their association among different aspects of gene regulation and candidate cis-regulatory elements (cCREs) have rarely been studied systematically. Here, we assess the impact of SVs intersecting with TAD boundaries by developing an integrative Hi-C analysis pipeline, which enables the generation of an in-depth catalog of TADs and TAD boundaries in human lymphoblastoid cell lines (LCLs) to fill the gap of limited resources. Our catalog contains 18,865 TADs, including 4596 sub-TADs, with 185 SVs (TAD–SVs) that alter chromatin architecture. By leveraging the ENCODE registry of cCREs in humans, we determine that 34 of 185 TAD–SVs intersect with cCREs and observe significant enrichment of TAD–SVs within cCREs. This study provides a database of TADs and TAD–SVs in the human genome that will facilitate future investigations of the impact of SVs on chromatin structure and gene regulation in health and disease.
{"title":"An integrative TAD catalog in lymphoblastoid cell lines discloses the functional impact of deletions and insertions in human genomes","authors":"Chong Li, Marc Jan Bonder, Sabriya Syed, Matthew Jensen, Human Genome Structural Variation Consortium (HGSVC), HGSVC Functional Analysis Working Group, Mark B. Gerstein, Michael C. Zody, Mark J.P. Chaisson, Michael E. Talkowski, Tobias Marschall, Jan O. Korbel, Evan E. Eichler, Charles Lee, Xinghua Shi","doi":"10.1101/gr.279419.124","DOIUrl":"https://doi.org/10.1101/gr.279419.124","url":null,"abstract":"The human genome is packaged within a three-dimensional (3D) nucleus and organized into structural units known as compartments, topologically associating domains (TADs), and loops. TAD boundaries, separating adjacent TADs, have been found to be well conserved across mammalian species and more evolutionarily constrained than TADs themselves. Recent studies show that structural variants (SVs) can modify 3D genomes through the disruption of TADs, which play an essential role in insulating genes from outside regulatory elements’ aberrant regulation. However, how SV affects the 3D genome structure and their association among different aspects of gene regulation and candidate <em>cis</em>-regulatory elements (cCREs) have rarely been studied systematically. Here, we assess the impact of SVs intersecting with TAD boundaries by developing an integrative Hi-C analysis pipeline, which enables the generation of an in-depth catalog of TADs and TAD boundaries in human lymphoblastoid cell lines (LCLs) to fill the gap of limited resources. Our catalog contains 18,865 TADs, including 4596 sub-TADs, with 185 SVs (TAD–SVs) that alter chromatin architecture. By leveraging the ENCODE registry of cCREs in humans, we determine that 34 of 185 TAD–SVs intersect with cCREs and observe significant enrichment of TAD–SVs within cCREs. This study provides a database of TADs and TAD–SVs in the human genome that will facilitate future investigations of the impact of SVs on chromatin structure and gene regulation in health and disease.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"199 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142783298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arne Sahm, Konstantin Riege, Marco Groth, Martin Bens, Johann Kraus, Martin Fischer, Hans Kestler, Christoph Englert, Ralf Schaible, Matthias Platzer, Steve Hoffmann
Growing evidence suggests that somatic mutations may be a major cause of the aging process. However, it remains to be tested whether the predictions of the theory also apply to species with longer life spans than humans. Hydra is a genus of freshwater polyps with remarkable regeneration abilities and a potentially unlimited life span under laboratory conditions. By genome sequencing of single cells and whole animals, we found that the mutation rates in Hydra’s stem cells are even slightly higher than in humans or mice. A potential explanation for this deviation from the prediction of the theory may lie in the adaptability offered by a higher mutation rate, as we were able to show that the genome of the widely studied Hydra magnipapillata strain 105 has undergone a process of strong positive selection since the strain's cultivation 50 years ago. This most likely represents a rapid adaptation to the drastically altered environmental conditions associated with the transition from the wild to laboratory conditions. Processes under positive selection in captive animals include pathways associated with Hydra’s simple nervous system, its nucleic acid metabolic process, cell migration, and hydrolase activity.
{"title":"Hydra has mammal-like mutation rates facilitating fast adaptation despite its nonaging phenotype","authors":"Arne Sahm, Konstantin Riege, Marco Groth, Martin Bens, Johann Kraus, Martin Fischer, Hans Kestler, Christoph Englert, Ralf Schaible, Matthias Platzer, Steve Hoffmann","doi":"10.1101/gr.279025.124","DOIUrl":"https://doi.org/10.1101/gr.279025.124","url":null,"abstract":"Growing evidence suggests that somatic mutations may be a major cause of the aging process. However, it remains to be tested whether the predictions of the theory also apply to species with longer life spans than humans. <em>Hydra</em> is a genus of freshwater polyps with remarkable regeneration abilities and a potentially unlimited life span under laboratory conditions. By genome sequencing of single cells and whole animals, we found that the mutation rates in <em>Hydra</em>’s stem cells are even slightly higher than in humans or mice. A potential explanation for this deviation from the prediction of the theory may lie in the adaptability offered by a higher mutation rate, as we were able to show that the genome of the widely studied <em>Hydra magnipapillata s</em>train 105 has undergone a process of strong positive selection since the strain's cultivation 50 years ago. This most likely represents a rapid adaptation to the drastically altered environmental conditions associated with the transition from the wild to laboratory conditions. Processes under positive selection in captive animals include pathways associated with <em>Hydra</em>’s simple nervous system, its nucleic acid metabolic process, cell migration, and hydrolase activity.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"27 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142777005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jonathan Cahn, James P.B. Lloyd, Ino D. Karemaker, Pascal W.T.C. Jansen, Jahnvi Pflueger, Owen Duncan, Jakob Petereit, Ozren Bogdanovic, A. Harvey Millar, Michiel Vermeulen, Ryan Lister
In plants, cytosine DNA methylation (mC) is largely associated with transcriptional repression of transposable elements, but it can also be found in the body of expressed genes, referred to as gene body methylation (gbM). gbM is correlated with ubiquitously expressed genes; however, its function, or absence thereof, is highly debated. The different outputs that mC can have raise questions as to how it is interpreted—or read—differently in these sequence and genomic contexts. To screen for potential mC-binding proteins, we performed an unbiased DNA affinity pull-down assay combined with quantitative mass spectrometry using methylated DNA probes for each DNA sequence context. All mC readers known to date preferentially bind to the methylated probes, along with a range of new mC-binding protein candidates. Functional characterization of these mC readers, focused on the MBD and SUVH families, was undertaken by ChIP-seq mapping of genome-wide binding sites, their protein interactors, and the impact of high-order mutations on transcriptomic and epigenomic profiles. Together, these results highlight specific context preferences for these proteins, and in particular the ability of MBD2 to bind predominantly to gbM. This comprehensive analysis of Arabidopsis mC readers emphasizes the complexity and interconnectivity between DNA methylation and chromatin remodeling processes in plants.
{"title":"Characterization of DNA methylation reader proteins of Arabidopsis thaliana","authors":"Jonathan Cahn, James P.B. Lloyd, Ino D. Karemaker, Pascal W.T.C. Jansen, Jahnvi Pflueger, Owen Duncan, Jakob Petereit, Ozren Bogdanovic, A. Harvey Millar, Michiel Vermeulen, Ryan Lister","doi":"10.1101/gr.279379.124","DOIUrl":"https://doi.org/10.1101/gr.279379.124","url":null,"abstract":"In plants, cytosine DNA methylation (mC) is largely associated with transcriptional repression of transposable elements, but it can also be found in the body of expressed genes, referred to as gene body methylation (gbM). gbM is correlated with ubiquitously expressed genes; however, its function, or absence thereof, is highly debated. The different outputs that mC can have raise questions as to how it is interpreted—or read—differently in these sequence and genomic contexts. To screen for potential mC-binding proteins, we performed an unbiased DNA affinity pull-down assay combined with quantitative mass spectrometry using methylated DNA probes for each DNA sequence context. All mC readers known to date preferentially bind to the methylated probes, along with a range of new mC-binding protein candidates. Functional characterization of these mC readers, focused on the MBD and SUVH families, was undertaken by ChIP-seq mapping of genome-wide binding sites, their protein interactors, and the impact of high-order mutations on transcriptomic and epigenomic profiles. Together, these results highlight specific context preferences for these proteins, and in particular the ability of MBD2 to bind predominantly to gbM. This comprehensive analysis of <em>Arabidopsis</em> mC readers emphasizes the complexity and interconnectivity between DNA methylation and chromatin remodeling processes in plants.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"28 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142776758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}