首页 > 最新文献

Genome research最新文献

英文 中文
Unravelling the architecture of major histocompatibility complex class II haplotypes in rhesus macaques 揭示猕猴主要组织相容性复合体 II 类单倍型的结构
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-10-23 DOI: 10.1101/gr.278968.124
Nanine de Groot, Marit van der Wiel, Ngoc Giang Le, Natasja G. de Groot, Jesse Bruijnesteijn, Ronald E. Bontrop
The regions in the genome that encode components of the immune system are often featured by polymorphism, copy number variation, and segmental duplications. There is a need to thoroughly characterize these complex regions to gain insight into the impact of genomic diversity on health and disease. Here we resolve the organization of complete major histocompatibility complex (MHC) class II regions in rhesus macaques by using a long-read sequencing strategy (Oxford Nanopore Technologies) in concert with adaptive sampling. In particular, the expansion and contraction of the primate DRB-region appear to be a dynamic process that involves the rearrangement of different cassettes of paralogous genes. These chromosomal recombination events are propagated by a conserved pseudogene, DRB6, which features the integration of two retroviral elements. In contrast, the DRA locus appears to be protected from rearrangements, which may be owing to the presence of an adjacently located truncated gene segment, DRB9. With our sequencing strategy, the annotation, evolutionary conservation, and potential function of pseudogenes can be reassessed, an aspect that was neglected by most genome studies in primates. Furthermore, our approach facilitates the characterization and refinement of an animal model essential to study human biology and disease.
基因组中编码免疫系统成分的区域通常具有多态性、拷贝数变异和片段重复等特征。有必要对这些复杂的区域进行彻底的特征描述,以深入了解基因组多样性对健康和疾病的影响。在这里,我们利用长读程测序策略(牛津纳米孔技术公司)和适应性取样技术,解析了猕猴完整的主要组织相容性复合体(MHC)II类区的组织结构。特别是,灵长类DRB区域的扩展和收缩似乎是一个动态过程,涉及到不同同源基因盒的重新排列。这些染色体重组事件是由一个保守的假基因 DRB6 传播的,其特点是整合了两个逆转录病毒元件。与此相反,DRA基因座似乎不受重排的影响,这可能是由于邻近存在一个截短的基因片段--DRB9。通过我们的测序策略,可以重新评估假基因的注释、进化保护和潜在功能,而大多数灵长类动物的基因组研究都忽略了这一点。此外,我们的方法还有助于鉴定和完善研究人类生物学和疾病所必需的动物模型。
{"title":"Unravelling the architecture of major histocompatibility complex class II haplotypes in rhesus macaques","authors":"Nanine de Groot, Marit van der Wiel, Ngoc Giang Le, Natasja G. de Groot, Jesse Bruijnesteijn, Ronald E. Bontrop","doi":"10.1101/gr.278968.124","DOIUrl":"https://doi.org/10.1101/gr.278968.124","url":null,"abstract":"The regions in the genome that encode components of the immune system are often featured by polymorphism, copy number variation, and segmental duplications. There is a need to thoroughly characterize these complex regions to gain insight into the impact of genomic diversity on health and disease. Here we resolve the organization of complete major histocompatibility complex (MHC) class II regions in rhesus macaques by using a long-read sequencing strategy (Oxford Nanopore Technologies) in concert with adaptive sampling. In particular, the expansion and contraction of the primate <em>DRB</em>-region appear to be a dynamic process that involves the rearrangement of different cassettes of paralogous genes. These chromosomal recombination events are propagated by a conserved pseudogene, <em>DRB6</em>, which features the integration of two retroviral elements. In contrast, the <em>DRA</em> locus appears to be protected from rearrangements, which may be owing to the presence of an adjacently located truncated gene segment, <em>DRB9</em>. With our sequencing strategy, the annotation, evolutionary conservation, and potential function of pseudogenes can be reassessed, an aspect that was neglected by most genome studies in primates. Furthermore, our approach facilitates the characterization and refinement of an animal model essential to study human biology and disease.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142489047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Candida albicans isolates contain frequent heterozygous structural variants and transposable elements within genes and centromeres 白色念珠菌分离物中含有频繁的杂合结构变体以及基因和中心粒内的转座元件
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-10-22 DOI: 10.1101/gr.279301.124
Ursula Oggenfuss, Robert T Todd, Natthapon Soisangwan, Bailey Kemp, Alison Guyer, Annette Beach, Anna Selmecki
The human fungal pathogen Candida albicans poses a significant burden on global health, causing high rates of mortality and antifungal drug resistance. C. albicans is a heterozygous diploid organism that reproduces asexually. Structural variants (SVs) are an important source of genomic rearrangement, particularly in species that lack sexual recombination. To comprehensively investigate SVs across clinical isolates of C. albicans, we conducted long read sequencing and genome-wide SV analysis in three distantly related clinical isolates. Our work included a new, comprehensive analysis of transposable element (TE) composition, location and diversity. SVs and TEs are frequently close to coding sequences and many SVs are heterozygous, suggesting that SVs might impact gene and allele-specific expression. Most SVs are uniquely present in only one clinical isolate, indicating that SVs represent a significant source of intra-species genetic variation. We identified multiple, distinct SVs at the centromeres of Chromosome 4 and Chromosome 5, including inversions and transposon polymorphisms. These two chromosomes are often aneuploid in drug resistant clinical isolates, and can form isochromosome structures with breakpoints near the centromere. Further screening of 100 clinical isolates confirmed the widespread presence of centromeric SVs in C. albicans, often appearing in a heterozygous state, indicating that SVs are contributing to centromere evolution in C. albicans. Together, these findings highlight that SVs and TEs are common across diverse clinical isolates of C. albicans and that the centromeres of this organism are important sites of genome rearrangement.
人类真菌病原体白色念珠菌对全球健康造成了巨大负担,导致高死亡率和抗真菌药物耐药性。白念珠菌是一种无性繁殖的杂合二倍体生物。结构变异(SV)是基因组重排的重要来源,尤其是在缺乏性重组的物种中。为了全面研究白僵菌临床分离株中的 SVs,我们对三个关系较远的临床分离株进行了长读数测序和全基因组 SV 分析。我们的工作包括对转座元件(TE)的组成、位置和多样性进行全新的全面分析。SV和TE经常靠近编码序列,而且许多SV是杂合的,这表明SV可能会影响基因和等位基因的特异性表达。大多数 SV 只存在于一个临床分离株中,这表明 SV 是种内遗传变异的重要来源。我们在 4 号染色体和 5 号染色体的中心点发现了多个不同的 SV,包括倒位和转座子多态性。在耐药性临床分离株中,这两条染色体通常为非整倍体,并可在中心点附近形成具有断点的等染色体结构。对 100 个临床分离株的进一步筛选证实,白僵菌中普遍存在中心粒 SV,而且通常以杂合状态出现,这表明 SV 正在促进白僵菌中心粒的进化。总之,这些发现突出表明,白僵菌的不同临床分离株中普遍存在 SVs 和 TEs,而且这种生物的中心粒是基因组重排的重要场所。
{"title":"Candida albicans isolates contain frequent heterozygous structural variants and transposable elements within genes and centromeres","authors":"Ursula Oggenfuss, Robert T Todd, Natthapon Soisangwan, Bailey Kemp, Alison Guyer, Annette Beach, Anna Selmecki","doi":"10.1101/gr.279301.124","DOIUrl":"https://doi.org/10.1101/gr.279301.124","url":null,"abstract":"The human fungal pathogen <em>Candida albicans</em> poses a significant burden on global health, causing high rates of mortality and antifungal drug resistance. <em>C. albicans</em> is a heterozygous diploid organism that reproduces asexually. Structural variants (SVs) are an important source of genomic rearrangement, particularly in species that lack sexual recombination. To comprehensively investigate SVs across clinical isolates of <em>C. albicans</em>, we conducted long read sequencing and genome-wide SV analysis in three distantly related clinical isolates. Our work included a new, comprehensive analysis of transposable element (TE) composition, location and diversity. SVs and TEs are frequently close to coding sequences and many SVs are heterozygous, suggesting that SVs might impact gene and allele-specific expression. Most SVs are uniquely present in only one clinical isolate, indicating that SVs represent a significant source of intra-species genetic variation. We identified multiple, distinct SVs at the centromeres of Chromosome 4 and Chromosome 5, including inversions and transposon polymorphisms. These two chromosomes are often aneuploid in drug resistant clinical isolates, and can form isochromosome structures with breakpoints near the centromere. Further screening of 100 clinical isolates confirmed the widespread presence of centromeric SVs in <em>C. albicans</em>, often appearing in a heterozygous state, indicating that SVs are contributing to centromere evolution in <em>C. albicans</em>. Together, these findings highlight that SVs and TEs are common across diverse clinical isolates of <em>C. albicans</em> and that the centromeres of this organism are important sites of genome rearrangement.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142487256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Long-read genome assembly of the insect model organism Tribolium castaneum reveals spread of satellite DNA in gene-rich regions by recurrent burst events 昆虫模式生物 Tribolium castaneum 的长读数基因组组装揭示了卫星 DNA 在基因丰富区域通过重复爆发事件进行传播的情况
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-10-22 DOI: 10.1101/gr.279225.124
Marin Volarić, Evelin Despot-Slade, Damira Veseljak, Brankica Mravinac, Nevenka Meštrović
Eukaryotic genomes are replete with satellite DNAs (satDNAs), large stretches of tandemly repeated sequences that are mostly underrepresented in genome assemblies. Here we combined nanopore long-read sequencing with a reference-guided assembly approach to generate an improved, high-quality genome assembly, TcasONT, of the model beetle Tribolium castaneum. Enriched by 45 Mb in repetitive regions, the new assembly comprises almost the entire genome sequence. We use the enhanced assembly to conduct global and in-depth analyses of abundant euchromatic satDNAs. Unexpectedly, we show the extensive spread of satDNAs in gene-rich regions, including long arrays. The sequence similarity relationships between satDNA monomers and arrays indicate a recent exchange of satDNA arrays between different chromosomes. We propose a scenario of their genome dynamics characterized by repeated bursts of satDNAs spreading through euchromatin, followed by a process of elongation and homogenization of arrays. We find that suppressed recombination on the X Chromosome has no significant effect on the spread of satDNAs but the X rather tolerates the amplification of satDNAs into longer arrays. Analyses of arrays’ neighboring regions show a tendency of one satDNA to be associated with transposable-like elements. Using 2D electrophoresis followed by Southern blotting, we prove Cast satDNAs’ presence in the fraction of extrachromosomal circular DNA (eccDNA). We point to two mechanisms that enable this satDNA spread to occur: transposition by transposable elements and insertion mediated by eccDNA. The presence of such a large proportion of satDNA in gene-rich regions inevitably gives rise to speculation about their possible influence on gene expression.
真核生物基因组中存在大量的卫星 DNA(satDNA),这些卫星 DNA 是大段的串联重复序列,在基因组组装中大多代表性不足。在这里,我们将纳米孔长读数测序与参考文献指导的组装方法相结合,生成了模式甲虫Tribolium castaneum的改进型高质量基因组组装体TcasONT。新的基因组序列丰富了 45 Mb 的重复区域,几乎包含了整个基因组序列。我们利用该增强序列对丰富的外色素饱和 DNA 进行了全面深入的分析。出乎意料的是,我们发现 satDNAs 广泛分布于基因丰富的区域,包括长阵列。satDNA单体和阵列之间的序列相似性关系表明,不同染色体之间的satDNA阵列最近发生了交换。我们提出了一种其基因组动态的设想,其特点是 satDNA 在外显子中反复突变扩散,然后是阵列的伸长和同质化过程。我们发现,X 染色体上被抑制的重组对 satDNA 的扩散没有显著影响,但 X 染色体却能容忍 satDNA 扩增成更长的阵列。对阵列邻近区域的分析表明,一个 satDNA 往往与类转座元件相关联。利用二维电泳法和 Southern 印迹法,我们证明了染色体外环状 DNA(eccDNA)中存在 Cast satDNAs。我们指出了使 satDNA 扩散的两种机制:转座元件的转座和 eccDNA 介导的插入。在基因丰富的区域存在如此大比例的satDNA,难免让人猜测它们对基因表达可能产生的影响。
{"title":"Long-read genome assembly of the insect model organism Tribolium castaneum reveals spread of satellite DNA in gene-rich regions by recurrent burst events","authors":"Marin Volarić, Evelin Despot-Slade, Damira Veseljak, Brankica Mravinac, Nevenka Meštrović","doi":"10.1101/gr.279225.124","DOIUrl":"https://doi.org/10.1101/gr.279225.124","url":null,"abstract":"Eukaryotic genomes are replete with satellite DNAs (satDNAs), large stretches of tandemly repeated sequences that are mostly underrepresented in genome assemblies. Here we combined nanopore long-read sequencing with a reference-guided assembly approach to generate an improved, high-quality genome assembly, TcasONT, of the model beetle <em>Tribolium castaneum</em>. Enriched by 45 Mb in repetitive regions, the new assembly comprises almost the entire genome sequence. We use the enhanced assembly to conduct global and in-depth analyses of abundant euchromatic satDNAs. Unexpectedly, we show the extensive spread of satDNAs in gene-rich regions, including long arrays. The sequence similarity relationships between satDNA monomers and arrays indicate a recent exchange of satDNA arrays between different chromosomes. We propose a scenario of their genome dynamics characterized by repeated bursts of satDNAs spreading through euchromatin, followed by a process of elongation and homogenization of arrays. We find that suppressed recombination on the X Chromosome has no significant effect on the spread of satDNAs but the X rather tolerates the amplification of satDNAs into longer arrays. Analyses of arrays’ neighboring regions show a tendency of one satDNA to be associated with transposable-like elements. Using 2D electrophoresis followed by Southern blotting, we prove Cast satDNAs’ presence in the fraction of extrachromosomal circular DNA (eccDNA). We point to two mechanisms that enable this satDNA spread to occur: transposition by transposable elements and insertion mediated by eccDNA. The presence of such a large proportion of satDNA in gene-rich regions inevitably gives rise to speculation about their possible influence on gene expression.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142487257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Binding profiles for 961 Drosophila and C. elegans transcription factors reveal tissue-specific regulatory relationships 961个果蝇和秀丽隐杆线虫转录因子的结合图谱揭示了组织特异性调控关系
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-10-22 DOI: 10.1101/gr.279037.124
Michelle Kudron, Louis Gewirtzman, Alec Victorsen, Bridget C Lear, Dionne Vafeados, Jiahao Gao, Jinrui Xu, Swapna Samanta, Emily Frink, Adri Tran-Pearson, Chau Hyunh, Ann Hammonds, William Fisher, Martha L Wall, Greg Wesseling, Vanessa Hernandez, Zhichun Lin, Mary Kasparian, Kevin P White, Ravi Allada, Mark Gerstein, LaDeana Hillier, Susan E Celniker, Valerie Reinke, Robert Waterston
A catalog of transcription factor (TF) binding sites in the genome is critical for deciphering regulatory relationships. Here we present the culmination of the efforts of the Model Organism ENCyclopedia Of DNA Elements (modENCODE) and the model organism Encyclopedia of Regulatory Networks (modERN) consortia to systematically assay TF binding events in vivo in two major model organisms, Drosophila melanogaster (fly) and Caenorhabditis elegans (worm). These datasets comprise 605 TFs identifying 3.6M sites in the fly and 356 TFs identifying 0.9 M sites in the worm and represent the majority of the regulatory space in each genome. We demonstrate that TFs associate with chromatin in clusters termed "metapeaks", that larger metapeaks have characteristics of high occupancy target (HOT) regions, and that the importance of consensus sequence motifs bound by TFs depends on metapeak size and complexity. Combining ChIP-seq data with single cell RNA-seq data in a machine learning model identifies TFs with a prominent role in promoting target gene expression in specific cell types, even differentiating between parent-daughter cells during embryogenesis. These data are a rich resource for the community that should fuel and guide future investigations into TF function. To facilitate data accessibility and utility, all strains expressing GFP-tagged TFs are available at the stock centers for each organism. The chromatin immunoprecipitation sequencing data are available through the ENCODE Data Coordinating Center, GEO, and through a direct interface that provides rapid access to processed data sets and summary analyses, as well as widgets to probe the cell type-specific TF-target relationships.
基因组中转录因子(TF)结合位点的目录对于破译调控关系至关重要。在这里,我们展示了模式生物 DNA 元素百科全书(modENCODE)和模式生物调控网络百科全书(modERN)联盟在两个主要模式生物--黑腹果蝇(蝇)和秀丽隐杆线虫(虫)--体内系统检测 TF 结合事件的成果。这些数据集包括 605 个 TF,在果蝇中识别出 360 万个位点,在蠕虫中识别出 356 个 TF,识别出 90 万个位点,代表了每个基因组中的大部分调控空间。我们证明了TFs与染色质结合在称为 "metapeaks "的集群中,较大的metapeaks具有高占位目标(HOT)区域的特征,TFs结合的共识序列基序的重要性取决于metapeaks的大小和复杂程度。将 ChIP-seq 数据与机器学习模型中的单细胞 RNA-seq 数据相结合,可以识别出在促进特定细胞类型中靶基因表达方面具有突出作用的 TFs,甚至可以在胚胎发生过程中区分亲代和子代细胞。这些数据为社区提供了丰富的资源,应能促进和指导未来对 TF 功能的研究。为了方便数据的获取和利用,每个生物体的基因库中心都提供了所有表达 GFP 标记 TF 的菌株。染色质免疫沉淀测序数据可通过 ENCODE 数据协调中心、GEO 以及一个直接界面获取,该界面可快速访问处理过的数据集和摘要分析,以及用于探究细胞类型特异性 TF-靶标关系的小工具。
{"title":"Binding profiles for 961 Drosophila and C. elegans transcription factors reveal tissue-specific regulatory relationships","authors":"Michelle Kudron, Louis Gewirtzman, Alec Victorsen, Bridget C Lear, Dionne Vafeados, Jiahao Gao, Jinrui Xu, Swapna Samanta, Emily Frink, Adri Tran-Pearson, Chau Hyunh, Ann Hammonds, William Fisher, Martha L Wall, Greg Wesseling, Vanessa Hernandez, Zhichun Lin, Mary Kasparian, Kevin P White, Ravi Allada, Mark Gerstein, LaDeana Hillier, Susan E Celniker, Valerie Reinke, Robert Waterston","doi":"10.1101/gr.279037.124","DOIUrl":"https://doi.org/10.1101/gr.279037.124","url":null,"abstract":"A catalog of transcription factor (TF) binding sites in the genome is critical for deciphering regulatory relationships. Here we present the culmination of the efforts of the Model Organism ENCyclopedia Of DNA Elements (modENCODE) and the model organism Encyclopedia of Regulatory Networks (modERN) consortia to systematically assay TF binding events in vivo in two major model organisms, <em>Drosophila melanogaster</em> (fly) and <em>Caenorhabditis elegans</em> (worm). These datasets comprise 605 TFs identifying 3.6M sites in the fly and 356 TFs identifying 0.9 M sites in the worm and represent the majority of the regulatory space in each genome. We demonstrate that TFs associate with chromatin in clusters termed \"metapeaks\", that larger metapeaks have characteristics of high occupancy target (HOT) regions, and that the importance of consensus sequence motifs bound by TFs depends on metapeak size and complexity. Combining ChIP-seq data with single cell RNA-seq data in a machine learning model identifies TFs with a prominent role in promoting target gene expression in specific cell types, even differentiating between parent-daughter cells during embryogenesis. These data are a rich resource for the community that should fuel and guide future investigations into TF function. To facilitate data accessibility and utility, all strains expressing GFP-tagged TFs are available at the stock centers for each organism. The chromatin immunoprecipitation sequencing data are available through the ENCODE Data Coordinating Center, GEO, and through a direct interface that provides rapid access to processed data sets and summary analyses, as well as widgets to probe the cell type-specific TF-target relationships.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142487258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inferring ancestry with the hierarchical soft clustering approach tangleGen. 用分层软聚类法 tangleGen 推断祖先。
IF 6.2 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-10-21 DOI: 10.1101/gr.279399.124
Klara Elisabeth Burger, Solveig Klepper, Ulrike von Luxburg, Franz Baumdicker

Understanding the genetic ancestry of populations is central to numerous scientific and societal fields. It contributes to a better understanding of human evolutionary history, advances personalized medicine, aids in forensic identification, and allows individuals to connect to their genealogical roots. Existing methods, such as ADMIXTURE, have significantly improved our ability to infer ancestries. However, these methods typically work with a fixed number of independent ancestral populations. As a result, they provide insight into genetic admixture, but do not include a hierarchical interpretation. In particular, the intricate ancestral population structures remain difficult to unravel. Alternative methods with a consistent inheritance structure, such as hierarchical clustering, may offer benefits in terms of interpreting the inferred ancestries. Here, we present tangleGen, a soft clustering tool that transfers the hierarchical machine learning framework Tangles, which leverages graph theoretical concepts, to the field of population genetics. The hierarchical perspective of tangleGen on the composition and structure of populations improves the interpretability of the inferred ancestral relationships. Moreover, tangleGen adds a new layer of explainability, as it allows identifying the SNPs that are responsible for the clustering structure. We demonstrate the capabilities and benefits of tangleGen for the inference of ancestral relationships, using both simulated data and data from the 1000 Genomes Project.

了解人口的遗传祖先对许多科学和社会领域都至关重要。它有助于更好地了解人类进化史,促进个性化医疗,帮助法医鉴定,并让个人与自己的家谱根源建立联系。ADMIXTURE 等现有方法大大提高了我们推断祖先的能力。然而,这些方法通常只适用于固定数量的独立祖先人群。因此,这些方法虽然能让我们深入了解基因混杂的情况,但并不包括层次解释。特别是,错综复杂的祖先种群结构仍然难以解开。具有一致遗传结构的替代方法,如分层聚类,可能会在解释推断的祖先方面带来好处。在这里,我们介绍一种软聚类工具 tangleGen,它将利用图论概念的分层机器学习框架 Tangles 移植到了群体遗传学领域。tangleGen 从分层的角度看待种群的组成和结构,提高了推断祖先关系的可解释性。此外,tangleGen 还增加了一层新的可解释性,因为它可以确定造成聚类结构的 SNPs。我们利用模拟数据和来自 1000 基因组计划的数据展示了 tangleGen 在推断祖先关系方面的能力和优势。
{"title":"Inferring ancestry with the hierarchical soft clustering approach tangleGen.","authors":"Klara Elisabeth Burger, Solveig Klepper, Ulrike von Luxburg, Franz Baumdicker","doi":"10.1101/gr.279399.124","DOIUrl":"https://doi.org/10.1101/gr.279399.124","url":null,"abstract":"<p><p>Understanding the genetic ancestry of populations is central to numerous scientific and societal fields. It contributes to a better understanding of human evolutionary history, advances personalized medicine, aids in forensic identification, and allows individuals to connect to their genealogical roots. Existing methods, such as ADMIXTURE, have significantly improved our ability to infer ancestries. However, these methods typically work with a fixed number of independent ancestral populations. As a result, they provide insight into genetic admixture, but do not include a hierarchical interpretation. In particular, the intricate ancestral population structures remain difficult to unravel. Alternative methods with a consistent inheritance structure, such as hierarchical clustering, may offer benefits in terms of interpreting the inferred ancestries. Here, we present tangleGen, a soft clustering tool that transfers the hierarchical machine learning framework Tangles, which leverages graph theoretical concepts, to the field of population genetics. The hierarchical perspective of tangleGen on the composition and structure of populations improves the interpretability of the inferred ancestral relationships. Moreover, tangleGen adds a new layer of explainability, as it allows identifying the SNPs that are responsible for the clustering structure. We demonstrate the capabilities and benefits of tangleGen for the inference of ancestral relationships, using both simulated data and data from the 1000 Genomes Project.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142463176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ultrasensitive allele inference from immune repertoire sequencing data with MiXCR. 利用 MiXCR 从免疫谱系测序数据中进行超灵敏等位基因推断。
IF 6.2 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-10-21 DOI: 10.1101/gr.278775.123
Artem Mikelov, George Nefedev, Aleksandr Tashkeev, Oscar L Rodriguez, Diego A Ortmans, Valeriia Skatova, Mark Izraelson, Alexey N Davydov, Stanislav Poslavsky, Souad Rahmouni, Corey T Watson, Dmitriy M Chudakov, Scott D Boyd, Dmitry A Bolotin

Allelic variability in the adaptive immune receptor loci, which harbor the gene segments that encode B cell and T cell receptors (BCR/TCR), is of critical importance for immune responses to pathogens and vaccines. Adaptive immune receptor repertoire sequencing (AIRR-seq) has become widespread in immunology research making it the most readily available source of information about allelic diversity in immunoglobulin (IG) and T cell receptor (TR) loci. Here we present a novel algorithm for extra-sensitive and specific variable (V) and joining (J) gene allele inference, allowing reconstruction of individual high-quality gene segment libraries. The approach can be applied for inferring allelic variants from peripheral blood lymphocyte BCR and TCR repertoire sequencing data, including hypermutated isotype-switched BCR sequences, thus allowing high-throughput novel allele discovery from a wide variety of existing datasets. The developed algorithm is a part of the MiXCR software. We demonstrate the accuracy of this approach using AIRR-seq paired with long-read genomic sequencing data, comparing it to a widely used algorithm, TIgGER. We applied the algorithm to a large set of IG heavy chain (IGH) AIRR-seq data from 450 donors of ancestrally diverse population groups, and to the largest reported full-length TCR alpha and beta chain (TRA; TRB) AIRR-seq dataset, representing 134 individuals. This allowed us to assess the genetic diversity within the IGH, TRA and TRB loci in different populations and to establish a database of alleles of V and J genes inferred from AIRR-seq data and their population frequencies with free public access through an online database.

适应性免疫受体基因座中的等位基因变异性对病原体和疫苗的免疫反应至关重要,而适应性免疫受体基因座中含有编码 B 细胞和 T 细胞受体(BCR/TCR)的基因片段。适应性免疫受体复合物测序(AIRR-seq)已在免疫学研究中得到广泛应用,使其成为有关免疫球蛋白(IG)和T细胞受体(TR)基因座等位基因多样性的最便捷信息来源。在这里,我们提出了一种用于超灵敏和特异性可变(V)和连接(J)基因等位基因推断的新算法,允许重建单个高质量基因片段库。该方法可用于从外周血淋巴细胞 BCR 和 TCR 重排测序数据(包括高突变同型切换 BCR 序列)中推断等位基因变异,从而实现从各种现有数据集中高通量发现新型等位基因。开发的算法是 MiXCR 软件的一部分。我们使用 AIRR-seq 与长线程基因组测序数据配对,证明了这种方法的准确性,并将其与广泛使用的算法 TIgGER 进行了比较。我们将该算法应用于来自不同祖先群体的 450 名供体的大量 IG 重链(IGH)AIRR-seq 数据集,以及已报道的代表 134 个个体的最大全长 TCR alpha 和 beta 链(TRA; TRB)AIRR-seq 数据集。这使我们能够评估不同人群中 IGH、TRA 和 TRB 位点的遗传多样性,并建立了一个数据库,其中包含从 AIRR-seq 数据中推断出的 V 和 J 基因等位基因及其人群频率,公众可通过在线数据库免费访问。
{"title":"Ultrasensitive allele inference from immune repertoire sequencing data with MiXCR.","authors":"Artem Mikelov, George Nefedev, Aleksandr Tashkeev, Oscar L Rodriguez, Diego A Ortmans, Valeriia Skatova, Mark Izraelson, Alexey N Davydov, Stanislav Poslavsky, Souad Rahmouni, Corey T Watson, Dmitriy M Chudakov, Scott D Boyd, Dmitry A Bolotin","doi":"10.1101/gr.278775.123","DOIUrl":"10.1101/gr.278775.123","url":null,"abstract":"<p><p>Allelic variability in the adaptive immune receptor loci, which harbor the gene segments that encode B cell and T cell receptors (BCR/TCR), is of critical importance for immune responses to pathogens and vaccines. Adaptive immune receptor repertoire sequencing (AIRR-seq) has become widespread in immunology research making it the most readily available source of information about allelic diversity in immunoglobulin (IG) and T cell receptor (TR) loci. Here we present a novel algorithm for extra-sensitive and specific variable (V) and joining (J) gene allele inference, allowing reconstruction of individual high-quality gene segment libraries. The approach can be applied for inferring allelic variants from peripheral blood lymphocyte BCR and TCR repertoire sequencing data, including hypermutated isotype-switched BCR sequences, thus allowing high-throughput novel allele discovery from a wide variety of existing datasets. The developed algorithm is a part of the MiXCR software. We demonstrate the accuracy of this approach using AIRR-seq paired with long-read genomic sequencing data, comparing it to a widely used algorithm, TIgGER. We applied the algorithm to a large set of IG heavy chain (<i>IGH</i>) AIRR-seq data from 450 donors of ancestrally diverse population groups, and to the largest reported full-length TCR alpha and beta chain (TRA; TRB) AIRR-seq dataset, representing 134 individuals. This allowed us to assess the genetic diversity within the <i>IGH</i>, <i>TRA</i> and <i>TRB</i> loci in different populations and to establish a database of alleles of V and J genes inferred from AIRR-seq data and their population frequencies with free public access through an online database.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142463177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analyzing super-enhancer temporal dynamics reveals potential critical enhancers and their gene regulatory networks underlying skeletal muscle development. 分析超级增强子的时间动态可揭示骨骼肌发育过程中潜在的关键增强子及其基因调控网络。
IF 6.2 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-10-21 DOI: 10.1101/gr.278344.123
Song Zhang, Chao Wang, Shenghua Qin, Choulin Chen, Yongzhou Bao, Yuanyuan Zhang, Lingna Xu, Qingyou Liu, Yunxiang Zhao, Kui Li, Zhonglin Tang, Yuwen Liu

Super-enhancers (SEs) govern the expression of genes defining cell identity. However, the dynamic landscape of SEs and their critical constituent enhancers involved in skeletal muscle development remains unclear. In this study, using pig as a model, we employed CUT&Tag to profile the enhancer-associated histone modification marker H3K27ac in skeletal muscle across two prenatal and three postnatal stages and investigated how SEs influence skeletal muscle development. We identified three SE families with distinct temporal dynamics: continuous (Con, 397), transient (TS, 434), and de novo (DN, 756). These SE families are associated with different temporal gene expression trajectories, biological functions, and DNA methylation levels. Notably, several lines of evidence suggest a potential prominent role of Con SEs in regulating porcine muscle development and meat traits. To pinpoint key cis-regulatory units in Con SEs, we developed an integrative approach that leverages information from eRNA annotation, GWAS signals and high-throughput capture STARR-seq experiments. Within Con SEs, we identified 20 candidate critical enhancers with meat and carcass-associated DNA variations that affect enhancer activity and inferred their upstream TFs and downstream target genes. As a proof of concept, we experimentally validated the role of one such enhancer and its potential target gene during myogenesis. Our findings reveal the dynamic regulatory features of SEs in skeletal muscle development and provide a general integrative framework for identifying critical enhancers underlying the formation of complex traits.

超级增强子(SE)控制着决定细胞特性的基因的表达。然而,参与骨骼肌发育的超级增强子及其关键组成增强子的动态图谱仍不清楚。在这项研究中,我们以猪为模型,利用 CUT&Tag 分析了骨骼肌中与增强子相关的组蛋白修饰标记 H3K27ac 在出生前两个阶段和出生后三个阶段的变化,并研究了增强子如何影响骨骼肌的发育。我们发现了三个具有不同时间动态的 SE 家族:连续 SE(Con,397 个)、瞬时 SE(TS,434 个)和新生 SE(DN,756 个)。这些 SE 家族与不同时间的基因表达轨迹、生物功能和 DNA 甲基化水平相关。值得注意的是,一些证据表明,Con SEs 在调节猪肌肉发育和肉质性状方面可能起着重要作用。为了精确定位 Con SEs 中的关键顺式调控单元,我们开发了一种综合方法,利用来自 eRNA 注释、GWAS 信号和高通量捕获 STARR-seq 实验的信息。在 Con SEs 中,我们发现了 20 个候选关键增强子,它们与肉类和胴体相关的 DNA 变异会影响增强子的活性,并推断出了它们的上游 TF 和下游靶基因。作为概念验证,我们通过实验验证了其中一个增强子及其潜在靶基因在肌形成过程中的作用。我们的研究结果揭示了骨骼肌发育过程中增强子的动态调控特征,并为确定复杂性状形成过程中的关键增强子提供了一个通用的综合框架。
{"title":"Analyzing super-enhancer temporal dynamics reveals potential critical enhancers and their gene regulatory networks underlying skeletal muscle development.","authors":"Song Zhang, Chao Wang, Shenghua Qin, Choulin Chen, Yongzhou Bao, Yuanyuan Zhang, Lingna Xu, Qingyou Liu, Yunxiang Zhao, Kui Li, Zhonglin Tang, Yuwen Liu","doi":"10.1101/gr.278344.123","DOIUrl":"https://doi.org/10.1101/gr.278344.123","url":null,"abstract":"<p><p>Super-enhancers (SEs) govern the expression of genes defining cell identity. However, the dynamic landscape of SEs and their critical constituent enhancers involved in skeletal muscle development remains unclear. In this study, using pig as a model, we employed CUT&Tag to profile the enhancer-associated histone modification marker H3K27ac in skeletal muscle across two prenatal and three postnatal stages and investigated how SEs influence skeletal muscle development. We identified three SE families with distinct temporal dynamics: continuous (Con, 397), transient (TS, 434), and de novo (DN, 756). These SE families are associated with different temporal gene expression trajectories, biological functions, and DNA methylation levels. Notably, several lines of evidence suggest a potential prominent role of Con SEs in regulating porcine muscle development and meat traits. To pinpoint key <i>cis</i>-regulatory units in Con SEs, we developed an integrative approach that leverages information from eRNA annotation, GWAS signals and high-throughput capture STARR-seq experiments. Within Con SEs, we identified 20 candidate critical enhancers with meat and carcass-associated DNA variations that affect enhancer activity and inferred their upstream TFs and downstream target genes. As a proof of concept, we experimentally validated the role of one such enhancer and its potential target gene during myogenesis. Our findings reveal the dynamic regulatory features of SEs in skeletal muscle development and provide a general integrative framework for identifying critical enhancers underlying the formation of complex traits.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142463175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chromatin interaction maps identify oncogenic targets of enhancer duplications in cancer 染色质相互作用图谱确定癌症中增强子重复的致癌靶点
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-10-18 DOI: 10.1101/gr.278418.123
Yueqiang Song, Fuyuan Li, Shangzi Wang, Yuntong Wang, Cong Lai, Lian Chen, Ning Jiang, Jin Li, Xingdong Chen, Swneke D. Bailey, Xiaoyang Zhang
As a major type of structural variants, tandem duplication plays a critical role in tumorigenesis by increasing oncogene dosage. Recent work has revealed that noncoding enhancers are also affected by duplications leading to the activation of oncogenes that are inside or outside of the duplicated regions. However, the prevalence of enhancer duplication and the identity of their target genes remains largely unknown in the cancer genome. Here, by analyzing whole-genome sequencing data in a non-gene-centric manner, we identify 881 duplication hotspots in 13 major cancer types, most of which do not contain protein-coding genes. We show that the hotspots are enriched with distal enhancer elements and are highly lineage-specific. We develop a HiChIP-based methodology that navigates enhancer–promoter contact maps to prioritize the target genes for the duplication hotspots harboring enhancer elements. The methodology identifies many novel enhancer duplication events activating oncogenes such as ESR1, FOXA1, GATA3, GATA6, TP63, and VEGFA, as well as potentially novel oncogenes such as GRHL2, IRF2BP2, and CREB3L1. In particular, we identify a duplication hotspot on Chromosome 10p15 harboring a cluster of enhancers, which skips over two genes, through a long-range chromatin interaction, to activate an oncogenic isoform of the NET1 gene to promote migration of gastric cancer cells. Focusing on tandem duplications, our study substantially extends the catalog of noncoding driver alterations in multiple cancer types, revealing attractive targets for functional characterization and therapeutic intervention.
作为结构变异的一种主要类型,串联重复通过增加癌基因的剂量在肿瘤发生中发挥着关键作用。最近的研究发现,非编码增强子也会受到重复的影响,从而激活重复区域内外的癌基因。然而,在癌症基因组中,增强子重复的普遍性及其靶基因的身份在很大程度上仍然未知。在这里,通过以非基因为中心的方式分析全基因组测序数据,我们在 13 种主要癌症类型中发现了 881 个重复热点,其中大部分不包含蛋白编码基因。我们发现这些热点富含远端增强子元件,而且具有高度的系特异性。我们开发了一种基于 HiChIP 的方法,该方法可浏览增强子-启动子接触图,从而优先确定含有增强子元件的复制热点的目标基因。该方法发现了许多激活 ESR1、FOXA1、GATA3、GATA6、TP63 和 VEGFA 等致癌基因的新型增强子重复事件,以及 GRHL2、IRF2BP2 和 CREB3L1 等潜在的新型致癌基因。特别是,我们在染色体10p15上发现了一个重复热点,它含有一组增强子,通过长程染色质相互作用跳过两个基因,激活NET1基因的致癌异构体,促进胃癌细胞的迁移。我们的研究以串联重复为重点,大大扩展了多种癌症类型中的非编码驱动基因改变目录,为功能表征和治疗干预揭示了有吸引力的靶点。
{"title":"Chromatin interaction maps identify oncogenic targets of enhancer duplications in cancer","authors":"Yueqiang Song, Fuyuan Li, Shangzi Wang, Yuntong Wang, Cong Lai, Lian Chen, Ning Jiang, Jin Li, Xingdong Chen, Swneke D. Bailey, Xiaoyang Zhang","doi":"10.1101/gr.278418.123","DOIUrl":"https://doi.org/10.1101/gr.278418.123","url":null,"abstract":"As a major type of structural variants, tandem duplication plays a critical role in tumorigenesis by increasing oncogene dosage. Recent work has revealed that noncoding enhancers are also affected by duplications leading to the activation of oncogenes that are inside or outside of the duplicated regions. However, the prevalence of enhancer duplication and the identity of their target genes remains largely unknown in the cancer genome. Here, by analyzing whole-genome sequencing data in a non-gene-centric manner, we identify 881 duplication hotspots in 13 major cancer types, most of which do not contain protein-coding genes. We show that the hotspots are enriched with distal enhancer elements and are highly lineage-specific. We develop a HiChIP-based methodology that navigates enhancer–promoter contact maps to prioritize the target genes for the duplication hotspots harboring enhancer elements. The methodology identifies many novel enhancer duplication events activating oncogenes such as <em>ESR1</em>, <em>FOXA1</em>, <em>GATA3, GATA6, TP63</em>, and <em>VEGFA</em>, as well as potentially novel oncogenes such as <em>GRHL2, IRF2BP2</em>, and <em>CREB3L1</em>. In particular, we identify a duplication hotspot on Chromosome 10p15 harboring a cluster of enhancers, which skips over two genes, through a long-range chromatin interaction, to activate an oncogenic isoform of the <em>NET1</em> gene to promote migration of gastric cancer cells. Focusing on tandem duplications, our study substantially extends the catalog of noncoding driver alterations in multiple cancer types, revealing attractive targets for functional characterization and therapeutic intervention.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142449557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic dysregulation of retrotransposons in neurodegenerative diseases at the single-cell level 单细胞水平上神经退行性疾病中逆转录转座子的动态失调
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-10-18 DOI: 10.1101/gr.279363.124
Wankun Deng, Citu Citu, Andi Liu, Zhongming Zhao
Retrotransposable elements (RTEs) are common mobile genetic elements comprising ∼42% of the human genome. RTEs play critical roles in gene regulation and function, but how they are specifically involved in complex diseases is largely unknown. Here, we investigate the cellular heterogeneity of RTEs using 12 single-cell transcriptome profiles covering three neurodegenerative diseases, Alzheimer's disease (AD), Parkinson's disease, and multiple sclerosis. We identify cell type marker RTEs in neurons, astrocytes, oligodendrocytes, and oligodendrocyte precursor cells that are related to these diseases. The differential expression analysis reveals the landscape of dysregulated RTE expression, especially L1s, in excitatory neurons of multiple neurodegenerative diseases. Machine learning algorithms for predicting cell disease stage using a combination of RTE and gene expression features suggests dynamic regulation of RTEs in AD. Furthermore, we construct a single-cell atlas of retrotransposable elements in neurodegenerative disease (scARE) using these data sets and features. scARE has six feature analysis modules to explore RTE dynamics in a user-defined condition. To our knowledge, scARE represents the first systematic investigation of RTE dynamics at the single-cell level within the context of neurodegenerative diseases.
可逆转座元件(RTE)是一种常见的移动遗传元件,占人类基因组的 42%。RTEs 在基因调控和功能方面发挥着关键作用,但它们如何具体参与复杂疾病的发生,目前尚不清楚。在这里,我们利用 12 个单细胞转录组图谱研究了 RTE 的细胞异质性,这些图谱涵盖了三种神经退行性疾病:阿尔茨海默病(AD)、帕金森病和多发性硬化症。我们在神经元、星形胶质细胞、少突胶质细胞和少突胶质细胞前体细胞中发现了与这些疾病相关的细胞类型标志物 RTE。差异表达分析揭示了多种神经退行性疾病的兴奋性神经元中 RTE(尤其是 L1s)表达失调的情况。结合 RTE 和基因表达特征预测细胞疾病阶段的机器学习算法表明,RTE 在 AD 中的动态调控。此外,我们还利用这些数据集和特征构建了神经退行性疾病逆转录表达元件单细胞图谱(scARE)。据我们所知,scARE 是首次在神经退行性疾病的背景下对单细胞水平的 RTE 动态进行的系统研究。
{"title":"Dynamic dysregulation of retrotransposons in neurodegenerative diseases at the single-cell level","authors":"Wankun Deng, Citu Citu, Andi Liu, Zhongming Zhao","doi":"10.1101/gr.279363.124","DOIUrl":"https://doi.org/10.1101/gr.279363.124","url":null,"abstract":"Retrotransposable elements (RTEs) are common mobile genetic elements comprising ∼42% of the human genome. RTEs play critical roles in gene regulation and function, but how they are specifically involved in complex diseases is largely unknown. Here, we investigate the cellular heterogeneity of RTEs using 12 single-cell transcriptome profiles covering three neurodegenerative diseases, Alzheimer's disease (AD), Parkinson's disease, and multiple sclerosis. We identify cell type marker RTEs in neurons, astrocytes, oligodendrocytes, and oligodendrocyte precursor cells that are related to these diseases. The differential expression analysis reveals the landscape of dysregulated RTE expression, especially L1s, in excitatory neurons of multiple neurodegenerative diseases. Machine learning algorithms for predicting cell disease stage using a combination of RTE and gene expression features suggests dynamic regulation of RTEs in AD. Furthermore, we construct a single-cell atlas of retrotransposable elements in neurodegenerative disease (scARE) using these data sets and features. scARE has six feature analysis modules to explore RTE dynamics in a user-defined condition. To our knowledge, scARE represents the first systematic investigation of RTE dynamics at the single-cell level within the context of neurodegenerative diseases.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142448826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
De novo genome assemblies of two cryptodiran turtles with ZZ/ZW and XX/XY sex chromosomes provide insights into patterns of genome reshuffling and uncover novel 3D genome folding in amniotes 两只具有 ZZ/ZW 和 XX/XY 性染色体的隐翅龟的全新基因组组装揭示了基因组重组模式,并发现了羊膜动物新的三维基因组折叠方式
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-10-16 DOI: 10.1101/gr.279443.124
Basanta Bista, Laura González-Rodelas, Lucía Álvarez-González, Zhi-qiang Wu, Eugenia E. Montiel, Ling Sze Lee, Daleen B. Badenhorst, Srihari Radhakrishnan, Robert Literman, Beatriz Navarro-Dominguez, John B. Iverson, Simon Orozco-Arias, Josefa González, Aurora Ruiz-Herrera, Nicole Valenzuela
Understanding the evolution of chromatin conformation among species is fundamental to elucidate the architecture and plasticity of genomes. Nonrandom interactions of linearly distant loci regulate gene function in species-specific patterns, affecting genome function, evolution, and, ultimately, speciation. Yet, data from nonmodel organisms are scarce. To capture the macroevolutionary diversity of vertebrate chromatin conformation, here we generate de novo genome assemblies for two cryptodiran (hidden-neck) turtles via Illumina sequencing, chromosome conformation capture, and RNA-seq: Apalone spinifera (ZZ/ZW, 2n = 66) and Staurotypus triporcatus (XX/XY, 2n = 54). We detected differences in the three-dimensional (3D) chromatin structure in turtles compared to other amniotes beyond the fusion/fission events detected in the linear genomes. Namely, whole-genome comparisons revealed distinct trends of chromosome rearrangements in turtles: (1) a low rate of genome reshuffling in Apalone (Trionychidae) whose karyotype is highly conserved when compared to chicken (likely ancestral for turtles), and (2) a moderate rate of fusions/fissions in Staurotypus (Kinosternidae) and Trachemys scripta (Emydidae). Furthermore, we identified a chromosome folding pattern that enables “centromere–telomere interactions” previously undetected in turtles. The combined turtle pattern of “centromere–telomere interactions” (discovered here) plus “centromere clustering” (previously reported in sauropsids) is novel for amniotes and it counters previous hypotheses about amniote 3D chromatin structure. We hypothesize that the divergent pattern found in turtles originated from an amniote ancestral state defined by a nuclear configuration with extensive associations among microchromosomes that were preserved upon the reshuffling of the linear genome.
了解物种间染色质构象的进化是阐明基因组结构和可塑性的基础。线性距离较远的基因座之间的非随机相互作用以物种特有的模式调节基因功能,影响基因组功能、进化,并最终影响物种分化。然而,来自非模式生物的数据却很少。为了捕捉脊椎动物染色质构象的宏观进化多样性,我们在这里通过Illumina测序、染色体构象捕获和RNA-seq,为两只隐颈龟(Apalone spinifera (ZZ/ZW, 2n = 66)和Staurotypus triporcatus (XX/XY, 2n = 54)生成了全新的基因组组装。)除了在线性基因组中检测到的融合/分裂事件外,我们还检测到龟鳖的三维(3D)染色质结构与其他羊膜动物存在差异。也就是说,全基因组比较揭示了龟类染色体重排的不同趋势:(1) Apalone(龟鳖目)的基因组重排率较低,其核型与鸡相比高度保守(可能是龟类的祖先);(2) Staurotypus(龟鳖目)和 Trachemys scripta(龟鳖科)的融合/裂变率适中。此外,我们还发现了一种染色体折叠模式,这种模式可使 "中心粒-端粒相互作用 "成为可能,而这种相互作用以前在龟类中尚未发现。龟类的 "中心粒-telomere相互作用"(在此发现)加上 "中心粒聚类"(以前在猿猴类中报道过)的组合模式在羊膜动物中是新颖的,它反驳了以前关于羊膜动物三维染色质结构的假说。我们假设,在龟类中发现的分化模式起源于羊膜动物的祖先状态,这种祖先状态是由核构型确定的,核构型中的微染色体之间存在广泛的关联,这种关联在线性基因组重新洗牌后得以保留。
{"title":"De novo genome assemblies of two cryptodiran turtles with ZZ/ZW and XX/XY sex chromosomes provide insights into patterns of genome reshuffling and uncover novel 3D genome folding in amniotes","authors":"Basanta Bista, Laura González-Rodelas, Lucía Álvarez-González, Zhi-qiang Wu, Eugenia E. Montiel, Ling Sze Lee, Daleen B. Badenhorst, Srihari Radhakrishnan, Robert Literman, Beatriz Navarro-Dominguez, John B. Iverson, Simon Orozco-Arias, Josefa González, Aurora Ruiz-Herrera, Nicole Valenzuela","doi":"10.1101/gr.279443.124","DOIUrl":"https://doi.org/10.1101/gr.279443.124","url":null,"abstract":"Understanding the evolution of chromatin conformation among species is fundamental to elucidate the architecture and plasticity of genomes. Nonrandom interactions of linearly distant loci regulate gene function in species-specific patterns, affecting genome function, evolution, and, ultimately, speciation. Yet, data from nonmodel organisms are scarce. To capture the macroevolutionary diversity of vertebrate chromatin conformation, here we generate de novo genome assemblies for two cryptodiran (hidden-neck) turtles via Illumina sequencing, chromosome conformation capture, and RNA-seq: <em>Apalone spinifera</em> (ZZ/ZW, 2<em>n</em> = 66) and <em>Staurotypus triporcatus</em> (XX/XY, 2<em>n</em> = 54). We detected differences in the three-dimensional (3D) chromatin structure in turtles compared to other amniotes beyond the fusion/fission events detected in the linear genomes. Namely, whole-genome comparisons revealed distinct trends of chromosome rearrangements in turtles: (1) a low rate of genome reshuffling in <em>Apalone</em> (Trionychidae) whose karyotype is highly conserved when compared to chicken (likely ancestral for turtles), and (2) a moderate rate of fusions/fissions in <em>Staurotypus</em> (Kinosternidae) and <em>Trachemys scripta</em> (Emydidae). Furthermore, we identified a chromosome folding pattern that enables “centromere–telomere interactions” previously undetected in turtles. The combined turtle pattern of “centromere–telomere interactions” (discovered here) plus “centromere clustering” (previously reported in sauropsids) is novel for amniotes and it counters previous hypotheses about amniote 3D chromatin structure. We hypothesize that the divergent pattern found in turtles originated from an amniote ancestral state defined by a nuclear configuration with extensive associations among microchromosomes that were preserved upon the reshuffling of the linear genome.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142443871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Genome research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1