首页 > 最新文献

Genome research最新文献

英文 中文
An integrative TAD catalog in lymphoblastoid cell lines discloses the functional impact of deletions and insertions in human genomes 淋巴母细胞细胞系的整合TAD目录揭示了人类基因组中缺失和插入的功能影响
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-12-05 DOI: 10.1101/gr.279419.124
Chong Li, Marc Jan Bonder, Sabriya Syed, Matthew Jensen, Human Genome Structural Variation Consortium (HGSVC), HGSVC Functional Analysis Working Group, Mark B. Gerstein, Michael C. Zody, Mark J.P. Chaisson, Michael E. Talkowski, Tobias Marschall, Jan O. Korbel, Evan E. Eichler, Charles Lee, Xinghua Shi
The human genome is packaged within a three-dimensional (3D) nucleus and organized into structural units known as compartments, topologically associating domains (TADs), and loops. TAD boundaries, separating adjacent TADs, have been found to be well conserved across mammalian species and more evolutionarily constrained than TADs themselves. Recent studies show that structural variants (SVs) can modify 3D genomes through the disruption of TADs, which play an essential role in insulating genes from outside regulatory elements’ aberrant regulation. However, how SV affects the 3D genome structure and their association among different aspects of gene regulation and candidate cis-regulatory elements (cCREs) have rarely been studied systematically. Here, we assess the impact of SVs intersecting with TAD boundaries by developing an integrative Hi-C analysis pipeline, which enables the generation of an in-depth catalog of TADs and TAD boundaries in human lymphoblastoid cell lines (LCLs) to fill the gap of limited resources. Our catalog contains 18,865 TADs, including 4596 sub-TADs, with 185 SVs (TAD–SVs) that alter chromatin architecture. By leveraging the ENCODE registry of cCREs in humans, we determine that 34 of 185 TAD–SVs intersect with cCREs and observe significant enrichment of TAD–SVs within cCREs. This study provides a database of TADs and TAD–SVs in the human genome that will facilitate future investigations of the impact of SVs on chromatin structure and gene regulation in health and disease.
人类基因组被包装在一个三维(3D)的细胞核中,并被组织成称为隔室(compartments)、拓扑相关结构域(TADs)和环的结构单元。分隔相邻TAD的TAD边界在哺乳动物物种中具有良好的保守性,并且比TAD本身更受进化限制。最近的研究表明,结构变异(SVs)可以通过破坏TADs来修饰三维基因组,TADs在隔离基因免受外部调控元件的异常调控中起着至关重要的作用。然而,SV如何影响三维基因组结构及其在基因调控和候选顺式调控元件(cCREs)的不同方面之间的关联却很少有系统的研究。在这里,我们通过开发一个整合的Hi-C分析管道来评估SVs与TAD边界相交的影响,该管道能够生成人类淋巴母细胞样细胞系(LCLs)中TAD和TAD边界的深入目录,以填补有限资源的空白。我们的目录包含18,865个tad,包括4596个亚tad,其中185个SVs (TAD-SVs)改变染色质结构。通过利用人类cCREs的ENCODE注册表,我们确定185个TAD-SVs中有34个与cCREs相交,并观察到在cCREs中TAD-SVs的显著富集。本研究提供了一个人类基因组中TADs和TAD-SVs的数据库,将有助于进一步研究SVs对染色质结构和健康和疾病中基因调控的影响。
{"title":"An integrative TAD catalog in lymphoblastoid cell lines discloses the functional impact of deletions and insertions in human genomes","authors":"Chong Li, Marc Jan Bonder, Sabriya Syed, Matthew Jensen, Human Genome Structural Variation Consortium (HGSVC), HGSVC Functional Analysis Working Group, Mark B. Gerstein, Michael C. Zody, Mark J.P. Chaisson, Michael E. Talkowski, Tobias Marschall, Jan O. Korbel, Evan E. Eichler, Charles Lee, Xinghua Shi","doi":"10.1101/gr.279419.124","DOIUrl":"https://doi.org/10.1101/gr.279419.124","url":null,"abstract":"The human genome is packaged within a three-dimensional (3D) nucleus and organized into structural units known as compartments, topologically associating domains (TADs), and loops. TAD boundaries, separating adjacent TADs, have been found to be well conserved across mammalian species and more evolutionarily constrained than TADs themselves. Recent studies show that structural variants (SVs) can modify 3D genomes through the disruption of TADs, which play an essential role in insulating genes from outside regulatory elements’ aberrant regulation. However, how SV affects the 3D genome structure and their association among different aspects of gene regulation and candidate <em>cis</em>-regulatory elements (cCREs) have rarely been studied systematically. Here, we assess the impact of SVs intersecting with TAD boundaries by developing an integrative Hi-C analysis pipeline, which enables the generation of an in-depth catalog of TADs and TAD boundaries in human lymphoblastoid cell lines (LCLs) to fill the gap of limited resources. Our catalog contains 18,865 TADs, including 4596 sub-TADs, with 185 SVs (TAD–SVs) that alter chromatin architecture. By leveraging the ENCODE registry of cCREs in humans, we determine that 34 of 185 TAD–SVs intersect with cCREs and observe significant enrichment of TAD–SVs within cCREs. This study provides a database of TADs and TAD–SVs in the human genome that will facilitate future investigations of the impact of SVs on chromatin structure and gene regulation in health and disease.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"199 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142783298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hydra has mammal-like mutation rates facilitating fast adaptation despite its nonaging phenotype 九头蛇具有类似哺乳动物的突变率,促进快速适应,尽管它的表型不老化
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-12-04 DOI: 10.1101/gr.279025.124
Arne Sahm, Konstantin Riege, Marco Groth, Martin Bens, Johann Kraus, Martin Fischer, Hans Kestler, Christoph Englert, Ralf Schaible, Matthias Platzer, Steve Hoffmann
Growing evidence suggests that somatic mutations may be a major cause of the aging process. However, it remains to be tested whether the predictions of the theory also apply to species with longer life spans than humans. Hydra is a genus of freshwater polyps with remarkable regeneration abilities and a potentially unlimited life span under laboratory conditions. By genome sequencing of single cells and whole animals, we found that the mutation rates in Hydra’s stem cells are even slightly higher than in humans or mice. A potential explanation for this deviation from the prediction of the theory may lie in the adaptability offered by a higher mutation rate, as we were able to show that the genome of the widely studied Hydra magnipapillata strain 105 has undergone a process of strong positive selection since the strain's cultivation 50 years ago. This most likely represents a rapid adaptation to the drastically altered environmental conditions associated with the transition from the wild to laboratory conditions. Processes under positive selection in captive animals include pathways associated with Hydra’s simple nervous system, its nucleic acid metabolic process, cell migration, and hydrolase activity.
越来越多的证据表明,体细胞突变可能是衰老过程的主要原因。然而,该理论的预测是否也适用于比人类寿命长的物种,还有待检验。九头蛇是一种淡水珊瑚虫属,在实验室条件下具有显著的再生能力和潜在的无限寿命。通过对单细胞和整个动物的基因组测序,我们发现九头蛇干细胞的突变率甚至略高于人类或小鼠。对这种偏离理论预测的潜在解释可能在于更高的突变率所提供的适应性,因为我们能够证明,广泛研究的水螅magnipapillata菌株105的基因组自50年前培养以来经历了一个强烈的正选择过程。这很可能代表了对急剧变化的环境条件的快速适应,这些环境条件与从野外到实验室条件的转变有关。圈养动物的正选择过程包括与水螅的简单神经系统、核酸代谢过程、细胞迁移和水解酶活性相关的途径。
{"title":"Hydra has mammal-like mutation rates facilitating fast adaptation despite its nonaging phenotype","authors":"Arne Sahm, Konstantin Riege, Marco Groth, Martin Bens, Johann Kraus, Martin Fischer, Hans Kestler, Christoph Englert, Ralf Schaible, Matthias Platzer, Steve Hoffmann","doi":"10.1101/gr.279025.124","DOIUrl":"https://doi.org/10.1101/gr.279025.124","url":null,"abstract":"Growing evidence suggests that somatic mutations may be a major cause of the aging process. However, it remains to be tested whether the predictions of the theory also apply to species with longer life spans than humans. <em>Hydra</em> is a genus of freshwater polyps with remarkable regeneration abilities and a potentially unlimited life span under laboratory conditions. By genome sequencing of single cells and whole animals, we found that the mutation rates in <em>Hydra</em>’s stem cells are even slightly higher than in humans or mice. A potential explanation for this deviation from the prediction of the theory may lie in the adaptability offered by a higher mutation rate, as we were able to show that the genome of the widely studied <em>Hydra magnipapillata s</em>train 105 has undergone a process of strong positive selection since the strain's cultivation 50 years ago. This most likely represents a rapid adaptation to the drastically altered environmental conditions associated with the transition from the wild to laboratory conditions. Processes under positive selection in captive animals include pathways associated with <em>Hydra</em>’s simple nervous system, its nucleic acid metabolic process, cell migration, and hydrolase activity.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"27 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142777005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Characterization of DNA methylation reader proteins of Arabidopsis thaliana 拟南芥DNA甲基化解读蛋白的研究
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-12-04 DOI: 10.1101/gr.279379.124
Jonathan Cahn, James P.B. Lloyd, Ino D. Karemaker, Pascal W.T.C. Jansen, Jahnvi Pflueger, Owen Duncan, Jakob Petereit, Ozren Bogdanovic, A. Harvey Millar, Michiel Vermeulen, Ryan Lister
In plants, cytosine DNA methylation (mC) is largely associated with transcriptional repression of transposable elements, but it can also be found in the body of expressed genes, referred to as gene body methylation (gbM). gbM is correlated with ubiquitously expressed genes; however, its function, or absence thereof, is highly debated. The different outputs that mC can have raise questions as to how it is interpreted—or read—differently in these sequence and genomic contexts. To screen for potential mC-binding proteins, we performed an unbiased DNA affinity pull-down assay combined with quantitative mass spectrometry using methylated DNA probes for each DNA sequence context. All mC readers known to date preferentially bind to the methylated probes, along with a range of new mC-binding protein candidates. Functional characterization of these mC readers, focused on the MBD and SUVH families, was undertaken by ChIP-seq mapping of genome-wide binding sites, their protein interactors, and the impact of high-order mutations on transcriptomic and epigenomic profiles. Together, these results highlight specific context preferences for these proteins, and in particular the ability of MBD2 to bind predominantly to gbM. This comprehensive analysis of Arabidopsis mC readers emphasizes the complexity and interconnectivity between DNA methylation and chromatin remodeling processes in plants.
在植物中,胞嘧啶DNA甲基化(mC)在很大程度上与转座因子的转录抑制有关,但它也可以在表达基因体内发现,称为基因体甲基化(gbM)。gbM与普遍表达的基因相关;然而,它的功能,或缺乏,是高度争议的。mC可能具有的不同输出提出了在这些序列和基因组环境中如何解释或读取不同的问题。为了筛选潜在的mc结合蛋白,我们使用甲基化DNA探针对每个DNA序列上下文进行了无偏DNA亲和下拉分析和定量质谱分析。迄今为止已知的所有mC读取器优先结合甲基化探针,以及一系列新的mC结合蛋白候选物。这些mC读取器的功能特征,主要集中在MBD和SUVH家族,是通过ChIP-seq定位全基因组结合位点,它们的蛋白质相互作用物,以及高顺序突变对转录组和表观基因组谱的影响进行的。总之,这些结果突出了这些蛋白质的特定环境偏好,特别是MBD2主要与gbM结合的能力。这项对拟南芥mC阅读器的综合分析强调了植物中DNA甲基化和染色质重塑过程之间的复杂性和互联性。
{"title":"Characterization of DNA methylation reader proteins of Arabidopsis thaliana","authors":"Jonathan Cahn, James P.B. Lloyd, Ino D. Karemaker, Pascal W.T.C. Jansen, Jahnvi Pflueger, Owen Duncan, Jakob Petereit, Ozren Bogdanovic, A. Harvey Millar, Michiel Vermeulen, Ryan Lister","doi":"10.1101/gr.279379.124","DOIUrl":"https://doi.org/10.1101/gr.279379.124","url":null,"abstract":"In plants, cytosine DNA methylation (mC) is largely associated with transcriptional repression of transposable elements, but it can also be found in the body of expressed genes, referred to as gene body methylation (gbM). gbM is correlated with ubiquitously expressed genes; however, its function, or absence thereof, is highly debated. The different outputs that mC can have raise questions as to how it is interpreted—or read—differently in these sequence and genomic contexts. To screen for potential mC-binding proteins, we performed an unbiased DNA affinity pull-down assay combined with quantitative mass spectrometry using methylated DNA probes for each DNA sequence context. All mC readers known to date preferentially bind to the methylated probes, along with a range of new mC-binding protein candidates. Functional characterization of these mC readers, focused on the MBD and SUVH families, was undertaken by ChIP-seq mapping of genome-wide binding sites, their protein interactors, and the impact of high-order mutations on transcriptomic and epigenomic profiles. Together, these results highlight specific context preferences for these proteins, and in particular the ability of MBD2 to bind predominantly to gbM. This comprehensive analysis of <em>Arabidopsis</em> mC readers emphasizes the complexity and interconnectivity between DNA methylation and chromatin remodeling processes in plants.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"28 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142776758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structure-optimized sgRNA selection with PlatinumCRISPr for efficient Cas9 generation of knockouts 利用PlatinumCRISPr进行结构优化的sgRNA选择,实现Cas9基因敲除的高效产生
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-12-03 DOI: 10.1101/gr.279479.124
Irmgard U. Haussmann, Thomas C. Dix, David W.J. McQuarrie, Veronica Dezi, Abdullah I. Hans, Roland Arnold, Matthias Soller
A single guide RNA (sgRNA) directs Cas9 nuclease for gene-specific scission of double-stranded DNA. High Cas9 activity is essential for efficient gene editing to generate gene deletions and gene replacements by homologous recombination. However, cleavage efficiency is below 50% for more than half of randomly selected sgRNA sequences in human cell culture screens or model organisms. We used in vitro assays to determine intrinsic molecular parameters for maximal sgRNA activity including correct folding of sgRNAs and Cas9 structural information. From the comparison of over 10 data sets, we find major constraints in sgRNA design originating from defective secondary structure of the sgRNA, sequence context of the seed region, GC context, and detrimental motifs, but we also find considerable variation among different prediction tools when applied to different data sets. To aid selection of efficient sgRNAs, we developed web-based PlatinumCRISPr, an sgRNA design tool to evaluate base-pairing and sequence composition parameters for optimal design of highly efficient sgRNAs for Cas9 genome editing. We applied this tool to select sgRNAs to efficiently generate gene deletions in Drosophila Ythdc1 and Ythdf, that bind to N6 methylated adenosines (m6A) in mRNA. However, we discovered that generating small deletions with sgRNAs and Cas9 leads to ectopic reinsertion of the deleted DNA fragment elsewhere in the genome. These insertions can be removed by standard genetic recombination and chromosome exchange. These new insights into sgRNA design and the mechanisms of CRISPR–Cas9 genome editing advance the efficient use of this technique for safer applications in humans.
单导RNA (sgRNA)指导Cas9核酸酶进行双链DNA的基因特异性切割。高Cas9活性是高效基因编辑产生基因缺失和同源重组基因替代的必要条件。然而,在人类细胞培养筛选或模式生物中,超过一半的随机选择的sgRNA序列的切割效率低于50%。我们使用体外实验来确定最大sgRNA活性的内在分子参数,包括sgRNA的正确折叠和Cas9的结构信息。通过对10多个数据集的比较,我们发现sgRNA设计的主要制约因素来自sgRNA的二级结构缺陷、种子区序列上下文、GC上下文和有害基序,但我们也发现不同的预测工具在应用于不同的数据集时存在相当大的差异。为了帮助选择高效的sgRNA,我们开发了基于web的PlatinumCRISPr,这是一个sgRNA设计工具,用于评估碱基配对和序列组成参数,以优化设计用于Cas9基因组编辑的高效sgRNA。我们使用该工具选择sgRNAs来有效地在果蝇Ythdc1和Ythdf中产生基因缺失,这些缺失与mRNA中的N6甲基化腺苷(m6A)结合。然而,我们发现,用sgrna和Cas9产生小的缺失会导致基因组中其他地方缺失的DNA片段异位重新插入。这些插入可以通过标准的基因重组和染色体交换去除。这些关于sgRNA设计和CRISPR-Cas9基因组编辑机制的新见解促进了该技术在人类中更安全应用的有效利用。
{"title":"Structure-optimized sgRNA selection with PlatinumCRISPr for efficient Cas9 generation of knockouts","authors":"Irmgard U. Haussmann, Thomas C. Dix, David W.J. McQuarrie, Veronica Dezi, Abdullah I. Hans, Roland Arnold, Matthias Soller","doi":"10.1101/gr.279479.124","DOIUrl":"https://doi.org/10.1101/gr.279479.124","url":null,"abstract":"A single guide RNA (sgRNA) directs Cas9 nuclease for gene-specific scission of double-stranded DNA. High Cas9 activity is essential for efficient gene editing to generate gene deletions and gene replacements by homologous recombination. However, cleavage efficiency is below 50% for more than half of randomly selected sgRNA sequences in human cell culture screens or model organisms. We used in vitro assays to determine intrinsic molecular parameters for maximal sgRNA activity including correct folding of sgRNAs and Cas9 structural information. From the comparison of over 10 data sets, we find major constraints in sgRNA design originating from defective secondary structure of the sgRNA, sequence context of the seed region, GC context, and detrimental motifs, but we also find considerable variation among different prediction tools when applied to different data sets. To aid selection of efficient sgRNAs, we developed web-based PlatinumCRISPr, an sgRNA design tool to evaluate base-pairing and sequence composition parameters for optimal design of highly efficient sgRNAs for Cas9 genome editing. We applied this tool to select sgRNAs to efficiently generate gene deletions in <em>Drosophila Ythdc1</em> and <em>Ythdf</em>, that bind to <em>N</em><sup>6</sup> methylated adenosines (m<sup>6</sup>A) in mRNA. However, we discovered that generating small deletions with sgRNAs and Cas9 leads to ectopic reinsertion of the deleted DNA fragment elsewhere in the genome. These insertions can be removed by standard genetic recombination and chromosome exchange. These new insights into sgRNA design and the mechanisms of CRISPR–Cas9 genome editing advance the efficient use of this technique for safer applications in humans.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"2 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142763467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The rate and spectrum of new mutations in mice inferred by long-read sequencing 通过长读测序推断小鼠新突变的速率和谱
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-12-02 DOI: 10.1101/gr.279982.124
Eugenio López-Cortegano, Jobran Chebib, Anika Jonas, Anastasia Vock, Sven Künzel, Peter D. Keightley, Diethard Tautz
All forms of genetic variation originate from new mutations, making it crucial to understand their rates and mechanisms. Here, we use long-read PacBio sequencing to investigate de novo mutations that accumulated in 12 inbred mouse lines derived from three commonly used inbred strains (C3H, C57BL/6, and FVB) maintained for 8-15 generations in a mutation accumulation (MA) experiment. We built chromosome-level genome assemblies based on the MA line founders' genomes, and then employed a combination of read and assembly-based methods to call the complete spectrum of new mutations. On average, there are ~45 mutations per haploid genome per generation, about half of which (54%) are insertions and deletions shorter than 50 bp (indels). The remainder are single nucleotide mutations (SNMs, 44%) and large structural mutations (SMs, 2%). We found that the degree of DNA repetitiveness is positively correlated with SNM and indel rates, and that a substantial fraction of SMs can be explained by homology-dependent mechanisms associated with repeat sequences. Most (90%) indels can be attributed to microsatellite contractions and expansions, and there is a marked bias towards 4 bp indels. Among the different types of SMs, tandem repeat mutations have the highest mutation rate, followed by insertions of transposable elements (TEs). We uncover a rich landscape of active TEs, and notable differences in their spectrum among MA lines and strains, and a high rate of gene retroposition. Our study offers novel insights into mammalian genome evolution, and highlights the importance of repetitive elements in shaping genomic diversity.
所有形式的遗传变异都源于新的突变,因此了解它们的速率和机制至关重要。在这里,我们使用长读PacBio测序研究了在突变积累(MA)实验中积累在12个自交系(C3H, C57BL/6和FVB)中积累8-15代的新生突变。我们基于MA系创始人的基因组构建了染色体水平的基因组组装,然后结合基于读取和组装的方法来调用新突变的完整谱。平均每个单倍体基因组每代约有45个突变,其中约一半(54%)是短于50 bp (indel)的插入和缺失。其余为单核苷酸突变(SNMs, 44%)和大结构突变(SMs, 2%)。我们发现DNA的重复程度与SNM和indel率呈正相关,并且相当一部分的SMs可以通过与重复序列相关的同源依赖机制来解释。大多数(90%)指数可归因于微卫星收缩和扩张,并且明显偏向于4 bp指数。在不同类型的SMs中,串联重复突变的突变率最高,其次是转座因子插入。我们发现了丰富的活性te景观,在MA系和菌株之间的光谱差异显著,基因逆转录率高。我们的研究为哺乳动物基因组进化提供了新的见解,并强调了重复元素在塑造基因组多样性中的重要性。
{"title":"The rate and spectrum of new mutations in mice inferred by long-read sequencing","authors":"Eugenio López-Cortegano, Jobran Chebib, Anika Jonas, Anastasia Vock, Sven Künzel, Peter D. Keightley, Diethard Tautz","doi":"10.1101/gr.279982.124","DOIUrl":"https://doi.org/10.1101/gr.279982.124","url":null,"abstract":"All forms of genetic variation originate from new mutations, making it crucial to understand their rates and mechanisms. Here, we use long-read PacBio sequencing to investigate de novo mutations that accumulated in 12 inbred mouse lines derived from three commonly used inbred strains (C3H, C57BL/6, and FVB) maintained for 8-15 generations in a mutation accumulation (MA) experiment. We built chromosome-level genome assemblies based on the MA line founders' genomes, and then employed a combination of read and assembly-based methods to call the complete spectrum of new mutations. On average, there are ~45 mutations per haploid genome per generation, about half of which (54%) are insertions and deletions shorter than 50 bp (indels). The remainder are single nucleotide mutations (SNMs, 44%) and large structural mutations (SMs, 2%). We found that the degree of DNA repetitiveness is positively correlated with SNM and indel rates, and that a substantial fraction of SMs can be explained by homology-dependent mechanisms associated with repeat sequences. Most (90%) indels can be attributed to microsatellite contractions and expansions, and there is a marked bias towards 4 bp indels. Among the different types of SMs, tandem repeat mutations have the highest mutation rate, followed by insertions of transposable elements (TEs). We uncover a rich landscape of active TEs, and notable differences in their spectrum among MA lines and strains, and a high rate of gene retroposition. Our study offers novel insights into mammalian genome evolution, and highlights the importance of repetitive elements in shaping genomic diversity.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"45 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142760310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inferring disease progressive stages in single-cell transcriptomics using a weakly-supervised deep learning approach 使用弱监督深度学习方法推断单细胞转录组学中的疾病进展阶段
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-12-02 DOI: 10.1101/gr.278812.123
Fabien Wehbe, Levi Adams, Jordan Babadoudou, Samantha Yuen, Yoon-Seong Kim, Yoshiaki Tanaka
Application of single-cell/nucleus genomic sequencing to patient-derived tissues offers potential solutions to delineate disease mechanisms in human. However, individual cells in patient-derived tissues are in different pathological stages, and hence such cellular variability impedes subsequent differential gene expression analyses. To overcome such heterogeneity issue, we present a novel deep learning approach, scIDST, that infers disease progressive levels of individual cells with weak supervision framework. The inferred disease progressive cells displayed significant differential expression of disease-relevant genes, which could not be detected by comparative analysis between patients and healthy donors. In addition, we demonstrated that pretrained models by scIDST are applicable to multiple independent data resources, and advantageous to infer cells related to certain disease risks and comorbidities. Taken together, scIDST offers a new strategy of single-cell sequencing analysis to identify bona fide disease-associated molecular features.
单细胞/细胞核基因组测序在患者来源的组织中的应用为描述人类疾病机制提供了潜在的解决方案。然而,患者来源组织中的单个细胞处于不同的病理阶段,因此这种细胞变异性阻碍了随后的差异基因表达分析。为了克服这种异质性问题,我们提出了一种新的深度学习方法,scIDST,该方法可以在弱监督框架下推断单个细胞的疾病进展水平。推断出的疾病进展细胞显示出疾病相关基因的显著差异表达,这在患者和健康供者之间的比较分析中无法检测到。此外,我们证明了scIDST预训练模型适用于多个独立的数据资源,并且有利于推断与某些疾病风险和合并症相关的细胞。综上所述,scIDST提供了一种新的单细胞测序分析策略,以确定真正的疾病相关的分子特征。
{"title":"Inferring disease progressive stages in single-cell transcriptomics using a weakly-supervised deep learning approach","authors":"Fabien Wehbe, Levi Adams, Jordan Babadoudou, Samantha Yuen, Yoon-Seong Kim, Yoshiaki Tanaka","doi":"10.1101/gr.278812.123","DOIUrl":"https://doi.org/10.1101/gr.278812.123","url":null,"abstract":"Application of single-cell/nucleus genomic sequencing to patient-derived tissues offers potential solutions to delineate disease mechanisms in human. However, individual cells in patient-derived tissues are in different pathological stages, and hence such cellular variability impedes subsequent differential gene expression analyses. To overcome such heterogeneity issue, we present a novel deep learning approach, scIDST, that infers disease progressive levels of individual cells with weak supervision framework. The inferred disease progressive cells displayed significant differential expression of disease-relevant genes, which could not be detected by comparative analysis between patients and healthy donors. In addition, we demonstrated that pretrained models by scIDST are applicable to multiple independent data resources, and advantageous to infer cells related to certain disease risks and comorbidities. Taken together, scIDST offers a new strategy of single-cell sequencing analysis to identify bona fide disease-associated molecular features.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"32 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142760320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A low-abundance class of Dicer-dependent siRNAs produced from a variety of features in C. elegans 一种低丰度的dicer依赖性sirna,由秀丽隐杆线虫的多种特征产生
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-12-02 DOI: 10.1101/gr.279083.124
Thiago L. Knittel, Brooke E. Montgomery, Alex J. Tate, Ennis W. Deihl, Anastasia S. Nawrocki, Frederic J. Hoerndli, Taiowa A. Montgomery
Canonical small interfering RNAs (siRNAs) are processed from double-stranded RNA (dsRNA) by Dicer and associate with Argonautes to direct RNA silencing. In Caenorhabditis elegans, 22G-RNAs and 26G-RNAs are often referred to as siRNAs but display distinct characteristics. For example, 22G-RNAs do not originate from dsRNA and do not depend on Dicer, whereas 26G-RNAs require Dicer but derive from an atypical RNA duplex and are produced exclusively antisense to their messenger RNA (mRNA) templates. To identify canonical siRNAs in C. elegans, we first characterized the siRNAs produced via the exogenous RNA interference (RNAi) pathway. During RNAi, dsRNA is processed into ∼23 nt duplexes with ∼2 nt, 3′-overhangs, ultimately yielding siRNAs devoid of 5′G-containing sequences that bind with high affinity to the Argonaute RDE-1, but also to the microRNA (miRNA) pathway Argonaute, ALG-1. Using these characteristics, we searched for their endogenous counterparts and identified thousands of endogenous loci representing dozens of unique elements that give rise to mostly low to moderate levels of siRNAs, called 23H-RNAs. These loci include repetitive elements, putative coding genes, pseudogenes, noncoding RNAs, and unannotated features, many of which adopt hairpin (hp) structures reminiscent of the hpRNA/RNAi pathway in flies and mice. RDE-1 competes with other Argonautes for binding to 23H-RNAs. When RDE-1 is depleted, these siRNAs are enriched in ALG-1 and ALG-2 complexes. Our results expand the known repertoire of C. elegans small RNAs and their Argonaute interactors, and demonstrate that key features of the endogenous siRNA pathway are relatively unchanged in animals.
典型的小干扰RNA (sirna)是由Dicer和Argonautes结合的双链RNA (dsRNA)加工成直接RNA沉默的。在秀丽隐杆线虫中,22g - rna和26g - rna通常被称为sirna,但表现出不同的特征。例如,22g -RNA不起源于dsRNA,也不依赖于Dicer,而26g -RNA需要Dicer,但来自非典型RNA双工,并且完全反义地产生其信使RNA (mRNA)模板。为了鉴定秀丽隐杆线虫的典型sirna,我们首先鉴定了通过外源性RNA干扰(RNAi)途径产生的sirna。在RNAi过程中,dsRNA被加工成具有~ 2 nt, 3 ' -悬垂的~ 23 nt双链,最终产生不含5 ' g -含序列的sirna,这些序列与Argonaute RDE-1以及microRNA (Argonaute, ag -1)途径具有高亲和力。利用这些特征,我们搜索了它们的内源性对应物,并鉴定了数千个内源性位点,这些位点代表了几十种独特的元件,这些元件产生了低至中等水平的sirna,称为23h - rna。这些基因座包括重复元件、假定的编码基因、假基因、非编码rna和未注释的特征,其中许多采用发夹(hp)结构,使人想起果蝇和小鼠中的hpRNA/RNAi途径。RDE-1与其他Argonautes竞争与23h - rna的结合。当RDE-1缺失时,这些sirna在ALG-1和ALG-2复合物中富集。我们的研究结果扩展了秀丽隐杆线虫小rna及其Argonaute相互作用物的已知库,并证明内源性siRNA途径的关键特征在动物中相对不变。
{"title":"A low-abundance class of Dicer-dependent siRNAs produced from a variety of features in C. elegans","authors":"Thiago L. Knittel, Brooke E. Montgomery, Alex J. Tate, Ennis W. Deihl, Anastasia S. Nawrocki, Frederic J. Hoerndli, Taiowa A. Montgomery","doi":"10.1101/gr.279083.124","DOIUrl":"https://doi.org/10.1101/gr.279083.124","url":null,"abstract":"Canonical small interfering RNAs (siRNAs) are processed from double-stranded RNA (dsRNA) by Dicer and associate with Argonautes to direct RNA silencing. In <em>Caenorhabditis elegans</em>, 22G-RNAs and 26G-RNAs are often referred to as siRNAs but display distinct characteristics. For example, 22G-RNAs do not originate from dsRNA and do not depend on Dicer, whereas 26G-RNAs require Dicer but derive from an atypical RNA duplex and are produced exclusively antisense to their messenger RNA (mRNA) templates. To identify canonical siRNAs in <em>C. elegans</em>, we first characterized the siRNAs produced via the exogenous RNA interference (RNAi) pathway. During RNAi, dsRNA is processed into ∼23 nt duplexes with ∼2 nt, 3′-overhangs, ultimately yielding siRNAs devoid of 5′G-containing sequences that bind with high affinity to the Argonaute RDE-1, but also to the microRNA (miRNA) pathway Argonaute, ALG-1. Using these characteristics, we searched for their endogenous counterparts and identified thousands of endogenous loci representing dozens of unique elements that give rise to mostly low to moderate levels of siRNAs, called 23H-RNAs. These loci include repetitive elements, putative coding genes, pseudogenes, noncoding RNAs, and unannotated features, many of which adopt hairpin (hp) structures reminiscent of the hpRNA/RNAi pathway in flies and mice. RDE-1 competes with other Argonautes for binding to 23H-RNAs. When RDE-1 is depleted, these siRNAs are enriched in ALG-1 and ALG-2 complexes. Our results expand the known repertoire of <em>C. elegans</em> small RNAs and their Argonaute interactors, and demonstrate that key features of the endogenous siRNA pathway are relatively unchanged in animals.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"45 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142760361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Single-nucleus CUT&RUN elucidates the function of intrinsic and genomics-driven epigenetic heterogeneity in head and neck cancer progression 单核CUT&RUN阐明了内在和基因组学驱动的表观遗传异质性在头颈癌进展中的作用
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-12-02 DOI: 10.1101/gr.279105.124
Howard Womersley, Daniel Muliaditan, Ramanuj DasGupta, Lih Feng Cheow
Interrogating regulatory epigenetic alterations during tumor progression at the resolution of single cells has remained an understudied area of research. Here we developed a highly sensitive single-nucleus CUT&RUN (snCUT&RUN) assay to profile histone modifications in isogenic primary, metastatic, and cisplatin-resistant head and neck squamous cell carcinoma (HNSCC) patient-derived tumor cell lines. We find that the epigenome can be involved in diverse modes to contribute towards HNSCC progression. First, we demonstrate that gene expression changes during HNSCC progression can be comodulated by alterations in both copy number and chromatin activity, driving epigenetic rewiring of cell states. Furthermore, intratumour epigenetic heterogeneity (ITeH) may predispose subclonal populations within the primary tumour to adapt to selective pressures and foster the acquisition of malignant characteristics. In conclusion, snCUT&RUN serves as a valuable addition to the existing toolkit of single-cell epigenomic assays and can be used to dissect the functionality of the epigenome during cancer progression.
在肿瘤进展过程中,单细胞分辨率下的调控表观遗传改变仍然是一个研究不足的研究领域。在这里,我们开发了一种高度敏感的单核CUT&;RUN (snCUT&;RUN)检测,以分析等基因原发性、转移性和顺铂耐药头颈部鳞状细胞癌(HNSCC)患者来源的肿瘤细胞系中的组蛋白修饰。我们发现表观基因组可以参与多种模式,促进HNSCC的进展。首先,我们证明了HNSCC进展过程中的基因表达变化可以通过拷贝数和染色质活性的改变来调节,从而驱动细胞状态的表观遗传重新连接。此外,肿瘤内表观遗传异质性(ITeH)可能使原发肿瘤内的亚克隆群体倾向于适应选择压力并促进恶性特征的获得。总之,snCUT&;RUN是对现有单细胞表观基因组分析工具包的一个有价值的补充,可用于分析癌症进展过程中表观基因组的功能。
{"title":"Single-nucleus CUT&RUN elucidates the function of intrinsic and genomics-driven epigenetic heterogeneity in head and neck cancer progression","authors":"Howard Womersley, Daniel Muliaditan, Ramanuj DasGupta, Lih Feng Cheow","doi":"10.1101/gr.279105.124","DOIUrl":"https://doi.org/10.1101/gr.279105.124","url":null,"abstract":"Interrogating regulatory epigenetic alterations during tumor progression at the resolution of single cells has remained an understudied area of research. Here we developed a highly sensitive single-nucleus CUT&amp;RUN (snCUT&amp;RUN) assay to profile histone modifications in isogenic primary, metastatic, and cisplatin-resistant head and neck squamous cell carcinoma (HNSCC) patient-derived tumor cell lines. We find that the epigenome can be involved in diverse modes to contribute towards HNSCC progression. First, we demonstrate that gene expression changes during HNSCC progression can be comodulated by alterations in both copy number and chromatin activity, driving epigenetic rewiring of cell states. Furthermore, intratumour epigenetic heterogeneity (ITeH) may predispose subclonal populations within the primary tumour to adapt to selective pressures and foster the acquisition of malignant characteristics. In conclusion, snCUT&amp;RUN serves as a valuable addition to the existing toolkit of single-cell epigenomic assays and can be used to dissect the functionality of the epigenome during cancer progression.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"13 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142760655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chimeric mitochondrial RNA transcripts predict mitochondrial genome deletion mutations in mitochondrial genetic diseases and aging 嵌合线粒体 RNA 转录本预测线粒体遗传疾病和衰老中的线粒体基因组缺失突变
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-11-27 DOI: 10.1101/gr.279072.124
Amy R Vandiver, Allen Herbst, Paul Stothard, Jonathan Wanagat
While it is well understood that mitochondrial DNA (mtDNA) deletion mutations cause incurable diseases and contribute to aging, little is known about the transcriptional products that arise from these DNA structural variants. We hypothesized that mitochondrial genomes containing deletion mutations express chimeric mitochondrial RNAs. To test this, we analyzed human and rat RNA sequencing data to identify, quantitate, and characterize chimeric mitochondrial RNAs. We observed increased chimeric mitochondrial RNA frequency in samples from patients with mitochondrial genetic diseases and in samples from aged humans. The spectrum of chimeric mitochondrial transcripts reflected the known pattern of mtDNA deletion mutations. To test the hypothesis that mtDNA deletions induce chimeric RNA transcripts, we treated 18 mo and 34 mo rats with guanidinopropionic acid to induce high levels of skeletal muscle mtDNA deletion mutations. With mtDNA deletion induction, we demonstrate that the chimeric mitochondrial transcript frequency also increased and correlated strongly with an orthogonal DNA-based mutation assay performed on identical samples. Further, we show that the frequency of chimeric mitochondrial transcripts predicts expression of both nuclear and mitochondrial genes central to mitochondrial function, demonstrating the utility of these events as metrics of age-induced metabolic change. Mapping and quantitation of chimeric mitochondrial RNAs provides an accessible, orthogonal approach to DNA-based mutation assays, offers a potential method for identifying mitochondrial pathology in widely accessible datasets, and opens a new area of study in mitochondrial genetics and transcriptomics.
尽管线粒体 DNA(mtDNA)缺失突变会导致无法治愈的疾病并导致衰老,但人们对这些 DNA 结构变异产生的转录产物却知之甚少。我们假设,含有缺失突变的线粒体基因组会表达嵌合线粒体 RNA。为了验证这一假设,我们分析了人类和大鼠的 RNA 测序数据,以识别、定量和描述嵌合线粒体 RNA。我们观察到,在线粒体遗传疾病患者的样本和老年人的样本中,嵌合线粒体 RNA 的频率有所增加。嵌合线粒体转录本的频谱反映了已知的 mtDNA 缺失突变模式。为了验证 mtDNA 缺失会诱导嵌合 RNA 转录本的假设,我们用胍基丙酸处理了 18 个月和 34 个月的大鼠,以诱导高水平的骨骼肌 mtDNA 缺失突变。随着 mtDNA 缺失的诱导,我们发现嵌合线粒体转录本的频率也在增加,并且与在相同样本上进行的基于 DNA 的正交突变检测密切相关。此外,我们还发现嵌合线粒体转录本的频率可以预测线粒体功能的核心核基因和线粒体基因的表达情况,从而证明了这些事件作为年龄诱导的代谢变化指标的实用性。嵌合线粒体 RNA 的制图和定量为基于 DNA 的突变检测提供了一种便捷、正交的方法,为在广泛获取的数据集中识别线粒体病理学提供了一种潜在的方法,并为线粒体遗传学和转录组学的研究开辟了一个新的领域。
{"title":"Chimeric mitochondrial RNA transcripts predict mitochondrial genome deletion mutations in mitochondrial genetic diseases and aging","authors":"Amy R Vandiver, Allen Herbst, Paul Stothard, Jonathan Wanagat","doi":"10.1101/gr.279072.124","DOIUrl":"https://doi.org/10.1101/gr.279072.124","url":null,"abstract":"While it is well understood that mitochondrial DNA (mtDNA) deletion mutations cause incurable diseases and contribute to aging, little is known about the transcriptional products that arise from these DNA structural variants. We hypothesized that mitochondrial genomes containing deletion mutations express chimeric mitochondrial RNAs. To test this, we analyzed human and rat RNA sequencing data to identify, quantitate, and characterize chimeric mitochondrial RNAs. We observed increased chimeric mitochondrial RNA frequency in samples from patients with mitochondrial genetic diseases and in samples from aged humans. The spectrum of chimeric mitochondrial transcripts reflected the known pattern of mtDNA deletion mutations. To test the hypothesis that mtDNA deletions induce chimeric RNA transcripts, we treated 18 mo and 34 mo rats with guanidinopropionic acid to induce high levels of skeletal muscle mtDNA deletion mutations. With mtDNA deletion induction, we demonstrate that the chimeric mitochondrial transcript frequency also increased and correlated strongly with an orthogonal DNA-based mutation assay performed on identical samples. Further, we show that the frequency of chimeric mitochondrial transcripts predicts expression of both nuclear and mitochondrial genes central to mitochondrial function, demonstrating the utility of these events as metrics of age-induced metabolic change. Mapping and quantitation of chimeric mitochondrial RNAs provides an accessible, orthogonal approach to DNA-based mutation assays, offers a potential method for identifying mitochondrial pathology in widely accessible datasets, and opens a new area of study in mitochondrial genetics and transcriptomics.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"25 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142718243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A deconvolution framework that uses single-cell sequencing plus a small benchmark dataset for accurate analysis of cell type ratios in complex tissue samples 利用单细胞测序和小型基准数据集精确分析复杂组织样本中细胞类型比例的解卷积框架
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-11-25 DOI: 10.1101/gr.278822.123
Shuai Guo, Xiaoqian Liu, Xuesen Cheng, Yujie Jiang, Shuangxi Ji, Qingnan Liang, Andrew Koval, Yumei Li, Leah A. Owen, Ivana K. Kim, Ana Aparicio, Sanghoon Lee, Anil K. Sood, Scott Kopetz, John Paul Shen, John N. Weinstein, Margaret M. DeAngelis, Rui Chen, Wenyi Wang
Bulk deconvolution with single-cell/nucleus RNA-seq data is critical for understanding heterogeneity in complex biological samples, yet the technological discrepancy across sequencing platforms limits deconvolution accuracy. To address this, we utilize an experimental design to match inter-platform biological signals, hence revealing the technological discrepancy, and then develop a deconvolution framework called DeMixSC using this well-matched, i.e., benchmark, data. Built upon a novel weighted nonnegative least-squares framework, DeMixSC identifies and adjusts genes with high technological discrepancy and aligns the benchmark data with large patient cohorts of matched-tissue-type for large-scale deconvolution. Our results using two benchmark datasets of healthy retinas and ovarian cancer tissues suggest much-improved deconvolution accuracy. Leveraging tissue-specific benchmark datasets, we applied DeMixSC to a large cohort of 453 age-related macular degeneration patients and a cohort of 30 ovarian cancer patients with various responses to neoadjuvant chemotherapy. Only DeMixSC successfully unveiled biologically meaningful differences across patient groups, demonstrating its broad applicability in diverse real-world clinical scenarios. Our findings reveal the impact of technological discrepancy on deconvolution performance and underscore the importance of a well-matched dataset to resolve this challenge. The developed DeMixSC framework is generally applicable for accurately deconvolving large cohorts of disease tissues, including cancers, when a well-matched benchmark dataset is available.
单细胞/细胞核 RNA-seq 数据的批量解卷积对于理解复杂生物样本的异质性至关重要,然而不同测序平台之间的技术差异限制了解卷积的准确性。为了解决这个问题,我们利用实验设计来匹配平台间的生物信号,从而揭示技术差异,然后利用这种匹配良好的数据(即基准数据)开发出一种名为 DeMixSC 的解卷积框架。DeMixSC 建立在一个新颖的加权非负最小二乘框架之上,它能识别和调整技术差异较大的基因,并将基准数据与匹配组织类型的大型患者队列进行比对,以实现大规模解卷积。我们使用健康视网膜和卵巢癌组织两个基准数据集得出的结果表明,解卷积的准确性大大提高。利用组织特异性基准数据集,我们将 DeMixSC 应用于 453 名年龄相关性黄斑变性患者组成的大型队列和 30 名对新辅助化疗有不同反应的卵巢癌患者组成的队列。只有 DeMixSC 成功揭示了不同患者群体之间具有生物学意义的差异,证明了它在现实世界各种临床场景中的广泛适用性。我们的研究结果揭示了技术差异对去卷积性能的影响,并强调了匹配良好的数据集对解决这一难题的重要性。如果有匹配良好的基准数据集,开发的 DeMixSC 框架一般适用于准确解卷积包括癌症在内的大型疾病组织队列。
{"title":"A deconvolution framework that uses single-cell sequencing plus a small benchmark dataset for accurate analysis of cell type ratios in complex tissue samples","authors":"Shuai Guo, Xiaoqian Liu, Xuesen Cheng, Yujie Jiang, Shuangxi Ji, Qingnan Liang, Andrew Koval, Yumei Li, Leah A. Owen, Ivana K. Kim, Ana Aparicio, Sanghoon Lee, Anil K. Sood, Scott Kopetz, John Paul Shen, John N. Weinstein, Margaret M. DeAngelis, Rui Chen, Wenyi Wang","doi":"10.1101/gr.278822.123","DOIUrl":"https://doi.org/10.1101/gr.278822.123","url":null,"abstract":"Bulk deconvolution with single-cell/nucleus RNA-seq data is critical for understanding heterogeneity in complex biological samples, yet the technological discrepancy across sequencing platforms limits deconvolution accuracy. To address this, we utilize an experimental design to match inter-platform biological signals, hence revealing the technological discrepancy, and then develop a deconvolution framework called DeMixSC using this well-matched, i.e., benchmark, data. Built upon a novel weighted nonnegative least-squares framework, DeMixSC identifies and adjusts genes with high technological discrepancy and aligns the benchmark data with large patient cohorts of matched-tissue-type for large-scale deconvolution. Our results using two benchmark datasets of healthy retinas and ovarian cancer tissues suggest much-improved deconvolution accuracy. Leveraging tissue-specific benchmark datasets, we applied DeMixSC to a large cohort of 453 age-related macular degeneration patients and a cohort of 30 ovarian cancer patients with various responses to neoadjuvant chemotherapy. Only DeMixSC successfully unveiled biologically meaningful differences across patient groups, demonstrating its broad applicability in diverse real-world clinical scenarios. Our findings reveal the impact of technological discrepancy on deconvolution performance and underscore the importance of a well-matched dataset to resolve this challenge. The developed DeMixSC framework is generally applicable for accurately deconvolving large cohorts of disease tissues, including cancers, when a well-matched benchmark dataset is available.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"35 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142712790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Genome research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1