首页 > 最新文献

Genome research最新文献

英文 中文
Haplotype-aware sequence alignment to pangenome graphs. 将单倍型感知序列比对到泛基因组图谱。
IF 6.2 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-10-11 DOI: 10.1101/gr.279143.124
Ghanshyam Chandra, Daniel Gibney, Chirag Jain

Modern pangenome graphs are built using haplotype-resolved genome assemblies. When mapping reads to a pangenome graph, prioritizing alignments that are consistent with the known haplotypes improves genotyping accuracy. However, the existing rigorous formulations for colinear chaining and alignment problems do not consider the haplotype paths in a pangenome graph. This often leads to spurious read alignments to those paths that are unlikely recombinations of the known haplotypes. In this paper, we develop novel formulations and algorithms for sequence-to-graph alignment and chaining problems. Inspired by the genotype imputation models, we assume that a query sequence is an imperfect mosaic of reference haplotypes. Accordingly, we introduce a recombination penalty in the scoring functions for each haplotype switch. First, we solve haplotype-aware sequence-to-graph alignment in [Formula: see text] time, where Q is the query sequence, E is the set of edges, and H is the set of haplotypes represented in the graph. To complement our solution, we prove that an algorithm significantly faster than [Formula: see text] is impossible under the strong exponential time hypothesis (SETH). Second, we propose a haplotype-aware chaining algorithm that runs in [Formula: see text] time after graph preprocessing, where N is the count of input anchors. We then establish that a chaining algorithm significantly faster than [Formula: see text] is impossible under SETH. As a proof-of-concept, we implemented our chaining algorithm in the Minichain aligner. By aligning sequences sampled from the human major histocompatibility complex (MHC) to a pangenome graph of 60 MHC haplotypes, we demonstrate that our algorithm achieves better consistency with ground-truth recombinations compared with a haplotype-agnostic algorithm.

现代泛基因组图谱是利用单倍型解析基因组组装构建的。在将读数映射到庞基因组图时,优先考虑与已知单倍型一致的配对,可以提高基因分型的准确性。然而,现有的共线性连锁和配准问题的严格公式并没有考虑庞基因组图中的单倍型路径。这往往会导致对已知单倍型不可能重组的路径进行虚假的读数比对。在本文中,我们针对序列到图的配准和连锁问题开发了新的公式和算法。受基因型估算模型的启发,我们假设查询序列是参考单倍型的不完全拼接。因此,我们在每个单倍型切换的评分函数中引入了重组惩罚。首先,我们在 O(|Q||E||H||) 时间内解决了单倍型感知序列到图的配准问题,其中 Q 是查询序列,E 是边集,H 是图中表示的单倍型集。为了补充我们的解决方案,我们证明了在强指数时间假说(SETH)下不可能有明显快于 O(|Q||E||H||)的算法。其次,我们提出了一种单体型感知链算法,该算法在图预处理后只需 O(|H|N log|H|N)时间即可运行,其中 N 是输入锚的数量。然后我们证明,在 SETH 条件下,速度明显快于 O(|H|N) 的链算法是不可能的。作为概念验证,我们在 Minichain 对齐器中实现了我们的链算法。通过将从人类主要组织相容性复合体(MHC)中抽取的序列与包含 60 个 MHC 单倍型的庞基因组图进行比对,我们证明,与单倍型不可知算法相比,我们的算法与地面真实重组的一致性更好。
{"title":"Haplotype-aware sequence alignment to pangenome graphs.","authors":"Ghanshyam Chandra, Daniel Gibney, Chirag Jain","doi":"10.1101/gr.279143.124","DOIUrl":"10.1101/gr.279143.124","url":null,"abstract":"<p><p>Modern pangenome graphs are built using haplotype-resolved genome assemblies. When mapping reads to a pangenome graph, prioritizing alignments that are consistent with the known haplotypes improves genotyping accuracy. However, the existing rigorous formulations for colinear chaining and alignment problems do not consider the haplotype paths in a pangenome graph. This often leads to spurious read alignments to those paths that are unlikely recombinations of the known haplotypes. In this paper, we develop novel formulations and algorithms for sequence-to-graph alignment and chaining problems. Inspired by the genotype imputation models, we assume that a query sequence is an imperfect mosaic of reference haplotypes. Accordingly, we introduce a recombination penalty in the scoring functions for each haplotype switch. First, we solve haplotype-aware sequence-to-graph alignment in [Formula: see text] time, where <i>Q</i> is the query sequence, <i>E</i> is the set of edges, and H is the set of haplotypes represented in the graph. To complement our solution, we prove that an algorithm significantly faster than [Formula: see text] is impossible under the strong exponential time hypothesis (SETH). Second, we propose a haplotype-aware chaining algorithm that runs in [Formula: see text] time after graph preprocessing, where <i>N</i> is the count of input anchors. We then establish that a chaining algorithm significantly faster than [Formula: see text] is impossible under SETH. As a proof-of-concept, we implemented our chaining algorithm in the Minichain aligner. By aligning sequences sampled from the human major histocompatibility complex (MHC) to a pangenome graph of 60 MHC haplotypes, we demonstrate that our algorithm achieves better consistency with ground-truth recombinations compared with a haplotype-agnostic algorithm.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11529843/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141626498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genetics-driven risk predictions leveraging the Mendelian randomization framework 利用孟德尔随机化框架进行遗传学驱动的风险预测
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-09-27 DOI: 10.1101/gr.279252.124
Daniel Sens, Liubov Shilova, Ludwig Gräf, Maria Grebenshchikova, Bjoern M. Eskofier, Francesco Paolo Casale
Accurate predictive models of future disease onset are crucial for effective preventive healthcare, yet longitudinal data sets linking early risk factors to subsequent health outcomes are limited. To overcome this challenge, we introduce a novel framework, Predictive Risk modeling using Mendelian Randomization (PRiMeR), which utilizes genetic effects as supervisory signals to learn disease risk predictors without relying on longitudinal data. To do so, PRiMeR leverages risk factors and genetic data from a healthy cohort, along with results from genome-wide association studies of diseases of interest. After training, the learned predictor can be used to assess risk for new patients solely based on risk factors. We validate PRiMeR through comprehensive simulations and in future type 2 diabetes predictions in UK Biobank participants without diabetes, using follow-up onset labels for validation. Moreover, we apply PRiMeR to predict future Alzheimer's disease onset from brain imaging biomarkers and future Parkinson's disease onset from accelerometer-derived traits. Overall, with PRiMeR we offer a new perspective in predictive modeling, showing it is possible to learn risk predictors leveraging genetics rather than longitudinal data.
未来疾病发病的准确预测模型对于有效的预防保健至关重要,然而将早期风险因素与后续健康结果联系起来的纵向数据集却很有限。为了克服这一挑战,我们引入了一个新颖的框架--使用孟德尔随机化的风险预测建模(PRiMeR),它利用遗传效应作为监督信号来学习疾病风险预测因子,而无需依赖纵向数据。为此,PRiMeR 利用了健康人群的风险因素和遗传数据,以及相关疾病的全基因组关联研究结果。经过训练后,学习到的预测因子可用于仅根据风险因素评估新患者的风险。我们通过综合模拟验证了 PRiMeR,并利用随访发病标签对英国生物库中未患糖尿病的参与者进行了未来 2 型糖尿病预测验证。此外,我们还将 PRiMeR 应用于根据脑成像生物标记预测阿尔茨海默病的未来发病情况,以及根据加速度计衍生特征预测帕金森病的未来发病情况。总之,通过 PRiMeR,我们为预测建模提供了一个新的视角,表明利用遗传学而不是纵向数据学习风险预测因子是可行的。
{"title":"Genetics-driven risk predictions leveraging the Mendelian randomization framework","authors":"Daniel Sens, Liubov Shilova, Ludwig Gräf, Maria Grebenshchikova, Bjoern M. Eskofier, Francesco Paolo Casale","doi":"10.1101/gr.279252.124","DOIUrl":"https://doi.org/10.1101/gr.279252.124","url":null,"abstract":"Accurate predictive models of future disease onset are crucial for effective preventive healthcare, yet longitudinal data sets linking early risk factors to subsequent health outcomes are limited. To overcome this challenge, we introduce a novel framework, <span>P</span>redictive <span>Ri</span>sk modeling using <span>Me</span>ndelian <span>R</span>andomization (PRiMeR), which utilizes genetic effects as supervisory signals to learn disease risk predictors without relying on longitudinal data. To do so, PRiMeR leverages risk factors and genetic data from a healthy cohort, along with results from genome-wide association studies of diseases of interest. After training, the learned predictor can be used to assess risk for new patients solely based on risk factors. We validate PRiMeR through comprehensive simulations and in future type 2 diabetes predictions in UK Biobank participants without diabetes, using follow-up onset labels for validation. Moreover, we apply PRiMeR to predict future Alzheimer's disease onset from brain imaging biomarkers and future Parkinson's disease onset from accelerometer-derived traits. Overall, with PRiMeR we offer a new perspective in predictive modeling, showing it is possible to learn risk predictors leveraging genetics rather than longitudinal data.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142329221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Targeted and complete genomic sequencing of the Major Histocompatibility Complex in haplotypic form of individual heterozygous samples 以个体杂合样本的单倍型形式对主要组织相容性复合体进行有针对性的全基因组测序
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-09-26 DOI: 10.1101/gr.278588.123
Taishan Hu, Timothy L. Mosbruger, Nikolaos G. Tairis, Amalia Dinou, Pushkala Jayaraman, Mahdi Sarmady, Kingham Brewster, Yang Li, Tristan J. Hayeck, Jamie L. Duke, Dimitri S. Monos
The human Major Histocompatibility Complex (MHC) is an approximately 4 Mb genomic segment on Chromosome 6 that plays a pivotal role in the immune response. Despite its importance in various traits and diseases, its complex nature makes it challenging to accurately characterize on a routine basis. We present a novel approach allowing targeted sequencing and de novo haplotypic assembly of the MHC region in heterozygous samples, using long-read sequencing technologies. Our approach is validated using two reference samples, two family trios, and an African-American sample. We achieved excellent coverage (96.6-99.9% with at least 30× depth) and high accuracy (99.89-99.99%) for the different haplotypes. This methodology offers a reliable and cost-effective method for sequencing and fully characterizing the MHC without the need for whole-genome sequencing, facilitating broader studies on this important genomic segment and having significant implications in immunology, genetics and medicine.
人类主要组织相容性复合物(MHC)是染色体 6 上一个约 4 Mb 的基因组片段,在免疫反应中起着关键作用。尽管它在各种性状和疾病中具有重要作用,但其复杂的性质使其在常规基础上进行准确表征具有挑战性。我们提出了一种新方法,利用长线程测序技术对杂合样本中的 MHC 区域进行有针对性的测序和全新的单倍型组装。我们使用两个参考样本、两个家庭三人组和一个非裔美国人样本对我们的方法进行了验证。我们对不同单倍型的覆盖率(96.6-99.9%,至少 30 倍深度)和准确率(99.89-99.99%)都非常高。该方法提供了一种可靠且经济有效的方法,无需进行全基因组测序即可对 MHC 进行测序并全面描述其特征,促进了对这一重要基因组片段的更广泛研究,在免疫学、遗传学和医学领域具有重要意义。
{"title":"Targeted and complete genomic sequencing of the Major Histocompatibility Complex in haplotypic form of individual heterozygous samples","authors":"Taishan Hu, Timothy L. Mosbruger, Nikolaos G. Tairis, Amalia Dinou, Pushkala Jayaraman, Mahdi Sarmady, Kingham Brewster, Yang Li, Tristan J. Hayeck, Jamie L. Duke, Dimitri S. Monos","doi":"10.1101/gr.278588.123","DOIUrl":"https://doi.org/10.1101/gr.278588.123","url":null,"abstract":"The human Major Histocompatibility Complex (MHC) is an approximately 4 Mb genomic segment on Chromosome 6 that plays a pivotal role in the immune response. Despite its importance in various traits and diseases, its complex nature makes it challenging to accurately characterize on a routine basis. We present a novel approach allowing targeted sequencing and de novo haplotypic assembly of the MHC region in heterozygous samples, using long-read sequencing technologies. Our approach is validated using two reference samples, two family trios, and an African-American sample. We achieved excellent coverage (96.6-99.9% with at least 30× depth) and high accuracy (99.89-99.99%) for the different haplotypes. This methodology offers a reliable and cost-effective method for sequencing and fully characterizing the MHC without the need for whole-genome sequencing, facilitating broader studies on this important genomic segment and having significant implications in immunology, genetics and medicine.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142325575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evolutionary dynamics of polyadenylation signals and their recognition strategies in protists 多腺苷酸化信号的进化动态及其在原生动物中的识别策略
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-09-26 DOI: 10.1101/gr.279526.124
Marcin P Sajek, Danielle Y Bilodeau, Michael A Beer, Emma Horton, Yukiko Miyamoto, Katrina B Velle, Lars Eckmann, Lillian Fritz-Laylin, Olivia S Rissland, Neelanjan Mukherjee
The poly(A) signal, together with auxiliary elements, directs cleavage of a pre-mRNA and thus determines the 3' end of the mature transcript. In many species, including humans, the poly(A) signal is an AAUAAA hexamer, but we recently found that the deeply branching eukaryote Giardia lamblia uses a distinct hexamer (AGURAA) and lacks any known auxiliary elements. Our discovery prompted us to explore the evolutionary dynamics of poly(A) signals and auxiliary elements in the eukaryotic kingdom. We used direct RNA sequencing to determine poly(A) signals for four protists within the Metamonada clade (which also contains Giardia lamblia) and two outgroup protists. These experiments revealed that the AAUAAA hexamer serves as the poly(A) signal in at least four different eukaryotic clades, indicating that it is likely the ancestral signal, whereas the unusual Giardia version is derived. We found that the use and relative strengths of auxiliary elements are also surprisingly plastic; in fact, within Metamonada, species like Giardia lamblia make use of a previously unrecognized auxiliary element where nucleotides flanking the poly(A) signal itself specify genuine cleavage sites. Thus, despite the fundamental nature of pre-mRNA cleavage for the expression of all protein-coding genes, the motifs controlling this process are dynamic on evolutionary timescales, providing motivation for future biochemical and structural studies as well as new therapeutic angles to target eukaryotic pathogens.
poly(A) 信号与辅助元件一起指导前 mRNA 的裂解,从而决定成熟转录本的 3' 端。在包括人类在内的许多物种中,poly(A)信号是一个 AAUAAA 六聚体,但我们最近发现,深枝真核生物蓝氏贾第鞭毛虫使用一种独特的六聚体(AGURAA),而且缺乏任何已知的辅助元件。这一发现促使我们探索真核生物王国中聚(A)信号和辅助元件的进化动态。我们使用直接 RNA 测序来确定 Metamonada 支系(也包括蓝氏贾第鞭毛虫)中四种原生动物和两种外群原生动物的 poly(A) 信号。这些实验表明,AAUAAA 六聚体在至少四个不同的真核生物支系中充当聚(A)信号,这表明它很可能是祖先的信号,而不常见的贾第鞭毛虫信号则是衍生信号。我们发现,辅助元件的使用和相对强度也具有惊人的可塑性;事实上,在 Metamonada 中,像蓝氏贾第鞭毛虫这样的物种使用了一种以前未被认识到的辅助元件,在这种辅助元件中,聚(A)信号本身侧翼的核苷酸指定了真正的裂解位点。因此,尽管前核糖核酸(pre-mRNA)的裂解是所有编码蛋白质基因表达的基础,但控制这一过程的基调在进化时间尺度上是动态的,这为未来的生物化学和结构研究以及针对真核病原体的新治疗角度提供了动力。
{"title":"Evolutionary dynamics of polyadenylation signals and their recognition strategies in protists","authors":"Marcin P Sajek, Danielle Y Bilodeau, Michael A Beer, Emma Horton, Yukiko Miyamoto, Katrina B Velle, Lars Eckmann, Lillian Fritz-Laylin, Olivia S Rissland, Neelanjan Mukherjee","doi":"10.1101/gr.279526.124","DOIUrl":"https://doi.org/10.1101/gr.279526.124","url":null,"abstract":"The poly(A) signal, together with auxiliary elements, directs cleavage of a pre-mRNA and thus determines the 3' end of the mature transcript. In many species, including humans, the poly(A) signal is an AAUAAA hexamer, but we recently found that the deeply branching eukaryote <em>Giardia lamblia</em> uses a distinct hexamer (AGURAA) and lacks any known auxiliary elements. Our discovery prompted us to explore the evolutionary dynamics of poly(A) signals and auxiliary elements in the eukaryotic kingdom. We used direct RNA sequencing to determine poly(A) signals for four protists within the Metamonada clade (which also contains <em>Giardia lamblia</em>) and two outgroup protists. These experiments revealed that the AAUAAA hexamer serves as the poly(A) signal in at least four different eukaryotic clades, indicating that it is likely the ancestral signal, whereas the unusual <em>Giardia</em> version is derived. We found that the use and relative strengths of auxiliary elements are also surprisingly plastic; in fact, within Metamonada, species like <em>Giardia lamblia</em> make use of a previously unrecognized auxiliary element where nucleotides flanking the poly(A) signal itself specify genuine cleavage sites. Thus, despite the fundamental nature of pre-mRNA cleavage for the expression of all protein-coding genes, the motifs controlling this process are dynamic on evolutionary timescales, providing motivation for future biochemical and structural studies as well as new therapeutic angles to target eukaryotic pathogens.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142325574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Designing realistic regulatory DNA with autoregressive language models 用自回归语言模型设计逼真的调控 DNA
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-09-25 DOI: 10.1101/gr.279142.124
Avantika Lal, David Garfield, Tommaso Biancalani, Gokcen Eraslan
Cis-regulatory elements (CREs), such as promoters and enhancers, are DNA sequences that regulate the expression of genes. The activity of a CRE is influenced by the order, composition, and spacing of sequence motifs that are bound by proteins called transcription factors (TFs). Synthetic CREs with specific properties are needed for biomanufacturing as well as for many therapeutic applications including cell and gene therapy. Here, we present regLM, a framework to design synthetic CREs with desired properties, such as high, low, or cell type–specific activity, using autoregressive language models in conjunction with supervised sequence-to-function models. We used our framework to design synthetic yeast promoters and cell type–specific human enhancers. We demonstrate that the synthetic CREs generated by our approach are not only predicted to have the desired functionality but also contain biological features similar to experimentally validated CREs. regLM thus facilitates the design of realistic regulatory DNA elements while providing insights into the cis-regulatory code.
启动子和增强子等顺式调节元件(CRE)是调节基因表达的 DNA 序列。CRE 的活性受被称为转录因子(TF)的蛋白质结合的序列基序的顺序、组成和间距的影响。生物制造和许多治疗应用(包括细胞和基因治疗)都需要具有特定特性的合成 CRE。在这里,我们介绍了 regLM,这是一种利用自回归语言模型和监督序列到功能模型设计具有所需特性(如高、低或细胞类型特异性活性)的合成 CRE 的框架。我们利用我们的框架设计了合成酵母启动子和细胞类型特异性人类增强子。我们证明,用我们的方法生成的合成 CRE 不仅能预测出所需的功能,而且还包含与实验验证的 CRE 相似的生物学特征。因此,regLM 可以促进现实调控 DNA 元件的设计,同时提供对顺式调控代码的深入了解。
{"title":"Designing realistic regulatory DNA with autoregressive language models","authors":"Avantika Lal, David Garfield, Tommaso Biancalani, Gokcen Eraslan","doi":"10.1101/gr.279142.124","DOIUrl":"https://doi.org/10.1101/gr.279142.124","url":null,"abstract":"<em>Cis</em>-regulatory elements (CREs), such as promoters and enhancers, are DNA sequences that regulate the expression of genes. The activity of a CRE is influenced by the order, composition, and spacing of sequence motifs that are bound by proteins called transcription factors (TFs). Synthetic CREs with specific properties are needed for biomanufacturing as well as for many therapeutic applications including cell and gene therapy. Here, we present regLM, a framework to design synthetic CREs with desired properties, such as high, low, or cell type–specific activity, using autoregressive language models in conjunction with supervised sequence-to-function models. We used our framework to design synthetic yeast promoters and cell type–specific human enhancers. We demonstrate that the synthetic CREs generated by our approach are not only predicted to have the desired functionality but also contain biological features similar to experimentally validated CREs. regLM thus facilitates the design of realistic regulatory DNA elements while providing insights into the <em>cis</em>-regulatory code.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142321736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mutational scanning of CRX classifies clinical variants and reveals biochemical properties of the transcriptional effector domain 对 CRX 的突变扫描可对临床变体进行分类,并揭示转录作用域的生化特性
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-09-25 DOI: 10.1101/gr.279415.124
James L Shepherdson, David M Granas, Jie Li, Zara Shariff, Stephen P Plassmeyer, Alex S Holehouse, Michael A White, Barak A Cohen
The transcription factor (TF) cone-rod homeobox (CRX) is essential for the differentiation and maintenance of photoreceptor cell identity. Several human CRX variants cause degenerative retinopathies, but most are variants of uncertain significance (VUS). We performed a deep mutational scan (DMS) of nearly all possible single amino acid substitutions in CRX using a cell-based transcriptional reporter assay, curating a high-confidence list of nearly 2,000 variants with altered transcriptional activity. In the structured homeodomain, activity scores closely aligned to a predicted structure and demonstrated position-specific constraints on amino acid substitution. By contrast, the intrinsically disordered transcriptional effector domain displayed a qualitatively different pattern of substitution effects, following compositional constraints without specific residue position requirements in the peptide chain. These compositional constraints were consistent with the acidic exposure model of transcriptional activation. We evaluated the performance of the DMS assay as a clinical variant classification tool using gold-standard classified human variants from ClinVar, identifying pathogenic variants with high specificity and moderate sensitivity. That this performance could be achieved using a synthetic reporter assay in a foreign cell type, even for a highly cell type-specific TF like CRX, suggests that this approach shows promise for DMS of other TFs that function in cell types that are not easily accessible. Together, the results of the CRX DMS identify molecular features of the CRX effector domain and demonstrate utility for integration into the clinical variant classification pipeline.
转录因子(TF)视锥-视杆细胞同源染色体(CRX)对感光细胞的分化和维持至关重要。一些人类 CRX 变异可导致退行性视网膜病变,但大多数都是意义不确定的变异(VUS)。我们利用一种基于细胞的转录报告检测方法,对 CRX 中几乎所有可能的单氨基酸置换进行了深度突变扫描(DMS),整理出了一份近 2000 个具有转录活性改变的变体的高置信度列表。在结构化同源结构域中,活性得分与预测的结构密切吻合,并显示了氨基酸替换对特定位置的限制。与此相反,内在无序的转录效应结构域则显示出一种质的不同的替代效应模式,它遵循肽链中无特定残基位置要求的组成约束。这些成分限制与转录激活的酸性暴露模型是一致的。我们使用来自 ClinVar 的黄金标准分类人类变异体评估了 DMS 检测作为临床变异体分类工具的性能,发现致病变异体的特异性很高,灵敏度适中。即使是像 CRX 这样细胞类型特异性很高的 TF,也能通过在外来细胞类型中使用合成报告试剂来实现这种性能,这表明这种方法有望用于在不易获得的细胞类型中发挥作用的其他 TF 的 DMS。总之,CRX DMS 的结果确定了 CRX 效应域的分子特征,并证明了将其整合到临床变异分类管道中的实用性。
{"title":"Mutational scanning of CRX classifies clinical variants and reveals biochemical properties of the transcriptional effector domain","authors":"James L Shepherdson, David M Granas, Jie Li, Zara Shariff, Stephen P Plassmeyer, Alex S Holehouse, Michael A White, Barak A Cohen","doi":"10.1101/gr.279415.124","DOIUrl":"https://doi.org/10.1101/gr.279415.124","url":null,"abstract":"The transcription factor (TF) cone-rod homeobox (CRX) is essential for the differentiation and maintenance of photoreceptor cell identity. Several human CRX variants cause degenerative retinopathies, but most are variants of uncertain significance (VUS). We performed a deep mutational scan (DMS) of nearly all possible single amino acid substitutions in CRX using a cell-based transcriptional reporter assay, curating a high-confidence list of nearly 2,000 variants with altered transcriptional activity. In the structured homeodomain, activity scores closely aligned to a predicted structure and demonstrated position-specific constraints on amino acid substitution. By contrast, the intrinsically disordered transcriptional effector domain displayed a qualitatively different pattern of substitution effects, following compositional constraints without specific residue position requirements in the peptide chain. These compositional constraints were consistent with the acidic exposure model of transcriptional activation. We evaluated the performance of the DMS assay as a clinical variant classification tool using gold-standard classified human variants from ClinVar, identifying pathogenic variants with high specificity and moderate sensitivity. That this performance could be achieved using a synthetic reporter assay in a foreign cell type, even for a highly cell type-specific TF like CRX, suggests that this approach shows promise for DMS of other TFs that function in cell types that are not easily accessible. Together, the results of the CRX DMS identify molecular features of the CRX effector domain and demonstrate utility for integration into the clinical variant classification pipeline.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142321735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AGAP duplicons associate with structural diversity at Chromosome 10q11.22 AGAP重复子与染色体10q11.22的结构多样性有关
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-09-25 DOI: 10.1101/gr.279454.124
Stefania Fornezza, Vincenza Simona Delvecchio, William T Harvey, Philip C Dishuck, Evan E Eichler, Giuliana Giannuzzi
The 10q11.22 chromosomal region is a duplication-rich interval of the human genome and one of the last to be fully assembled. It carries copy-number variable genes associated with intellectual disability, bipolar disorder, and obesity. In this study, we characterized the structural diversity at this locus by analyzing 64 haploid assemblies produced by the Human Pangenome Reference Consortium. We identified eleven alternative haplotypes that differ in the copy number and/or orientation of large genomic segments, ranging from hundreds of kilobase pairs (kbp) to over one megabase pair (Mbp). We uncovered a 2.4 Mbp size difference between the shortest and longest haplotypes. Breakpoint analysis revealed that genomic instability results from nonallelic homologous recombination between segmental duplication (SD) pairs with varying similarity (94.4-99.6%). Nonetheless, these pairs generally recombine at positions where their identity is higher (>99.6%). Recurrent inversions occur with varying breakpoints within the same inverted SD pair. Inversion polymorphisms shuffle the entire SD arrangement, creating new predispositions to copy-number variations. The SD architecture is associated with a catarrhine-specific subgroup of the AGAP gene family, which likely triggered the accumulation of SDs at this locus over the past 25 million years of human evolution. Our results reveal extensive structural diversity and genomic instability at the 10q11.22 locus and expand the general understanding of the mutational mechanisms behind SD-mediated rearrangements.
10q11.22 染色体区是人类基因组中一个富含重复的区段,也是最后一个完全组装完成的区段之一。它携带着与智力障碍、双相情感障碍和肥胖有关的拷贝数可变基因。在这项研究中,我们通过分析人类庞基因组参考联盟(Human Pangenome Reference Consortium)产生的 64 个单倍体组装结果,确定了该位点结构多样性的特征。我们发现了 11 种不同的单倍型,这些单倍型在大基因组片段的拷贝数和/或方向上存在差异,其范围从数百个碱基对(kbp)到超过一个百万碱基对(Mbp)不等。我们发现最短单倍型和最长单倍型之间存在 2.4 Mbp 的大小差异。断点分析表明,基因组的不稳定性来自于相似度不同(94.4-99.6%)的片段重复(SD)对之间的非等位同源重组。尽管如此,这些重复对通常会在同一性较高(99.6%)的位置发生重组。在同一倒位 SD 对中,重复倒位的断点各不相同。倒置多态性会改变整个 SD 排列,从而产生新的拷贝数变异倾向。SD结构与AGAP基因家族的白喉特异性亚群有关,这可能是过去2500万年人类进化过程中在该基因位点积累SD的诱因。我们的研究结果揭示了 10q11.22 位点上广泛的结构多样性和基因组不稳定性,并拓展了对 SD 介导的重排背后的突变机制的一般理解。
{"title":"AGAP duplicons associate with structural diversity at Chromosome 10q11.22","authors":"Stefania Fornezza, Vincenza Simona Delvecchio, William T Harvey, Philip C Dishuck, Evan E Eichler, Giuliana Giannuzzi","doi":"10.1101/gr.279454.124","DOIUrl":"https://doi.org/10.1101/gr.279454.124","url":null,"abstract":"The 10q11.22 chromosomal region is a duplication-rich interval of the human genome and one of the last to be fully assembled. It carries copy-number variable genes associated with intellectual disability, bipolar disorder, and obesity. In this study, we characterized the structural diversity at this locus by analyzing 64 haploid assemblies produced by the Human Pangenome Reference Consortium. We identified eleven alternative haplotypes that differ in the copy number and/or orientation of large genomic segments, ranging from hundreds of kilobase pairs (kbp) to over one megabase pair (Mbp). We uncovered a 2.4 Mbp size difference between the shortest and longest haplotypes. Breakpoint analysis revealed that genomic instability results from nonallelic homologous recombination between segmental duplication (SD) pairs with varying similarity (94.4-99.6%). Nonetheless, these pairs generally recombine at positions where their identity is higher (&gt;99.6%). Recurrent inversions occur with varying breakpoints within the same inverted SD pair. Inversion polymorphisms shuffle the entire SD arrangement, creating new predispositions to copy-number variations. The SD architecture is associated with a catarrhine-specific subgroup of the <em>AGAP</em> gene family, which likely triggered the accumulation of SDs at this locus over the past 25 million years of human evolution. Our results reveal extensive structural diversity and genomic instability at the 10q11.22 locus and expand the general understanding of the mutational mechanisms behind SD-mediated rearrangements.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142321733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rapid SARS-COV2 surveillance using clinical, pooled, or wastewater sequence as a sensor for population change 利用临床、汇集或废水序列作为人口变化的传感器,对 SARS-COV2 进行快速监测
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-09-25 DOI: 10.1101/gr.278594.123
Apurva Narechania, Dean Bobo, Kevin Deitz, Rob DeSalle, Paul Planet, Barun Mathema
The COVID-19 pandemic has highlighted the critical role of genomic surveillance for guiding policy and control. Timeliness is key, but sequence alignment and phylogeny slows most surveillance techniques. Millions of SARS-CoV-2 genomes have been assembled. Phylogenetic methods are ill equipped to handle this sheer scale. We introduce a pangenomic measure that examines the information diversity of a k-mer library drawn from a country's complete set of clinical, pooled, or wastewater sequence. Quantifying diversity is central to ecology. Hill numbers, or the effective number of species in a sample, provide a simple metric for comparing species diversity across environments. The more diverse the sample, the higher the Hill number. We adopt this ecological approach and consider each k-mer an individual and each genome a transect in the pangenome of the species. Structured in this way, Hill numbers summarize the temporal trajectory of pandemic variants, collapsing each day's assemblies into genome equivalents. For pooled or wastewater sequence, we instead compare days using survey sequence divorced from individual infections. Across data from the UK, USA, and South Africa, we trace the ascendance of new variants of concern as they emerge in local populations well before these variants are named and added to phylogenetic databases. Using data from San Diego wastewater, we monitor these same population changes from raw, unassembled sequence. This history of emerging variants senses all available data as it is sequenced, intimating variant sweeps to dominance or declines to extinction at the leading edge of the COVID19 pandemic.
COVID-19 大流行凸显了基因组监测在指导政策和控制方面的关键作用。及时性是关键,但序列比对和系统发育会减慢大多数监测技术的速度。数百万个 SARS-CoV-2 基因组已经组装完毕。系统发生学方法无法应对如此庞大的规模。我们介绍了一种泛基因组学测量方法,它可以检查从一个国家的全套临床、集合或废水序列中提取的 k-mer 文库的信息多样性。量化多样性是生态学的核心。希尔数或样本中物种的有效数量是比较不同环境中物种多样性的一个简单指标。样本越多样,希尔数越高。我们采用这种生态学方法,将每个 k-mer 视为一个个体,将每个基因组视为物种泛基因组中的一个横断面。通过这种结构,希尔数概括了大流行变体的时间轨迹,将每天的集合分解为等效的基因组。对于汇集序列或废水序列,我们则使用从单个感染中分离出来的调查序列来比较日数。通过英国、美国和南非的数据,我们追溯了在当地人群中出现的令人担忧的新变异体的上升过程,这些变异体被命名并添加到系统发育数据库之前就已经出现了。利用圣地亚哥废水中的数据,我们通过原始的、未组装的序列来监测这些相同的种群变化。这种新出现变异体的历史可以感知所有可用的测序数据,暗示变异体在 COVID19 大流行的前沿占据主导地位或衰退至灭绝。
{"title":"Rapid SARS-COV2 surveillance using clinical, pooled, or wastewater sequence as a sensor for population change","authors":"Apurva Narechania, Dean Bobo, Kevin Deitz, Rob DeSalle, Paul Planet, Barun Mathema","doi":"10.1101/gr.278594.123","DOIUrl":"https://doi.org/10.1101/gr.278594.123","url":null,"abstract":"The COVID-19 pandemic has highlighted the critical role of genomic surveillance for guiding policy and control. Timeliness is key, but sequence alignment and phylogeny slows most surveillance techniques. Millions of SARS-CoV-2 genomes have been assembled. Phylogenetic methods are ill equipped to handle this sheer scale. We introduce a pangenomic measure that examines the information diversity of a <em>k</em>-mer library drawn from a country's complete set of clinical, pooled, or wastewater sequence. Quantifying diversity is central to ecology. Hill numbers, or the effective number of species in a sample, provide a simple metric for comparing species diversity across environments. The more diverse the sample, the higher the Hill number. We adopt this ecological approach and consider each <em>k</em>-mer an individual and each genome a transect in the pangenome of the species. Structured in this way, Hill numbers summarize the temporal trajectory of pandemic variants, collapsing each day's assemblies into genome equivalents. For pooled or wastewater sequence, we instead compare days using survey sequence divorced from individual infections. Across data from the UK, USA, and South Africa, we trace the ascendance of new variants of concern as they emerge in local populations well before these variants are named and added to phylogenetic databases. Using data from San Diego wastewater, we monitor these same population changes from raw, unassembled sequence. This history of emerging variants senses all available data as it is sequenced, intimating variant sweeps to dominance or declines to extinction at the leading edge of the COVID19 pandemic.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142321732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Contrasting and combining transcriptome complexity captured by short and long RNA sequencing reads 对比并结合短RNA和长RNA测序读数捕获的转录组复杂性
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-09-25 DOI: 10.1101/gr.278659.123
Seong W Han, San Jewell, Andrei Thomas-Tikhonenko, Yoseph Barash
Mapping transcriptomic variations using either short- or long-reads RNA sequencing is a staple of genomic research. Long reads are able to capture entire isoforms and overcome repetitive regions, while short reads still provide improved coverage and error rates. Yet how to quantitatively compare the technologies, can we combine those, and what may be the benefit of such a combined view remain open questions. We tackle these questions by first creating a pipeline to assess matched long and short reads data using a variety of transcriptome statistics. We find that across datasets, algorithms, and technologies, matched short reads data detects roughly 30% more splice junctions such that 10-30% of the splice junctions included at 20% or more by short reads are missed by long reads. In contrast, long reads detect many more intron retention events and can detect full isoforms, pointing to the benefit of combining the technologies. We introduce MAJIQ-L, an extension of the MAJIQ software to enable a unified view of transcriptome variations from both technologies and demonstrate its benefits. Our software can be used to assess any future long-read technology or algorithm, and combine it with short reads data for improved transcriptome analysis.
使用短读数或长读数 RNA 测序绘制转录组变异图是基因组研究的主要方法。长读数能够捕获整个同工酶体并克服重复区域,而短读数仍能提高覆盖率和错误率。然而,如何对这些技术进行定量比较,我们能否将它们结合起来,以及这种组合视图的好处是什么,这些问题仍然悬而未决。为了解决这些问题,我们首先创建了一个管道,利用各种转录组统计数据来评估匹配的长读和短读数据。我们发现,在不同的数据集、算法和技术中,匹配的短文本数据检测到的剪接接头要多出大约 30%,这样,在短文本检测到的剪接接头中,有 10-30% 的剪接接头被长文本遗漏,而长文本检测到的剪接接头则为 20% 或更多。与此相反,长读数能检测到更多的内含子保留事件,并能检测到完整的同工酶,这说明了结合两种技术的好处。我们介绍了 MAJIQ-L 软件,它是 MAJIQ 软件的扩展,可以统一查看两种技术的转录组变化,并展示了其优势。我们的软件可用于评估任何未来的长读数技术或算法,并将其与短读数数据相结合,以改进转录组分析。
{"title":"Contrasting and combining transcriptome complexity captured by short and long RNA sequencing reads","authors":"Seong W Han, San Jewell, Andrei Thomas-Tikhonenko, Yoseph Barash","doi":"10.1101/gr.278659.123","DOIUrl":"https://doi.org/10.1101/gr.278659.123","url":null,"abstract":"Mapping transcriptomic variations using either short- or long-reads RNA sequencing is a staple of genomic research. Long reads are able to capture entire isoforms and overcome repetitive regions, while short reads still provide improved coverage and error rates. Yet how to quantitatively compare the technologies, can we combine those, and what may be the benefit of such a combined view remain open questions. We tackle these questions by first creating a pipeline to assess matched long and short reads data using a variety of transcriptome statistics. We find that across datasets, algorithms, and technologies, matched short reads data detects roughly 30% more splice junctions such that 10-30% of the splice junctions included at 20% or more by short reads are missed by long reads. In contrast, long reads detect many more intron retention events and can detect full isoforms, pointing to the benefit of combining the technologies. We introduce MAJIQ-L, an extension of the MAJIQ software to enable a unified view of transcriptome variations from both technologies and demonstrate its benefits. Our software can be used to assess any future long-read technology or algorithm, and combine it with short reads data for improved transcriptome analysis.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142321737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Systematic identification of interchromosomal interaction networks supports the existence of specialized RNA factories 染色体间相互作用网络的系统鉴定支持专门的 RNA 工厂的存在
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-09-25 DOI: 10.1101/gr.278327.123
Borislav H Hristov, William Stafford Noble, Alessandro Bertero
Most studies of genome organization have focused on intrachromosomal (cis) contacts because they harbor key features such as DNA loops and topologically associating domains. Interchromosomal (trans) contacts have received much less attention, and tools for interrogating potential biologically relevant trans structures are lacking. Here, we develop a computational framework that uses Hi-C data to identify sets of loci that jointly interact in trans. This method, trans-C, initiates probabilistic random walks with restarts from a set of seed loci to traverse an input Hi-C contact network, thereby identifying sets of trans-contacting loci. We validate trans-C in three increasingly complex models of established trans contacts: the Plasmodium falciparum var genes, the mouse olfactory receptor "Greek islands", and the human RBM20 cardiac splicing factory. We then apply trans-C to systematically test the hypothesis that genes coregulated by the same trans-acting element (i.e., a transcription or splicing factor) colocalize in three dimensions to form "RNA factories" that maximize the efficiency and accuracy of RNA biogenesis. We find that many loci with multiple binding sites of the same DNA binding proteins interact with one another in trans, especially those bound by factors with intrinsically disordered domains. Similarly, clustered binding of a subset of RNA-binding proteins correlates with trans interaction of the encoding loci. Intriguingly, we observe that these trans-interacting loci are close to nuclear speckles. Our findings support the existence of trans interacting chromatin domains (TIDs) driven by RNA biogenesis. Trans-C provides an efficient computational framework for studying these and other types of trans interactions, empowering studies of a poorly understood aspect of genome architecture.
对基因组组织的大多数研究都集中在染色体内(顺式)接触上,因为它们蕴藏着 DNA 环状和拓扑关联结构域等关键特征。染色体间(反式)接触受到的关注要少得多,而且也缺乏对潜在的生物相关反式结构进行研究的工具。在这里,我们开发了一个计算框架,利用 Hi-C 数据来识别反式联合相互作用的基因位点集。这种方法(trans-C)从一组种子基因位点开始重新启动概率随机游走,遍历输入的 Hi-C 接触网络,从而识别出反式接触的基因位点集。我们在恶性疟原虫变异基因、小鼠嗅觉受体 "希腊岛 "和人类 RBM20 心脏剪接工厂这三个日益复杂的反式接触模型中验证了 trans-C。然后,我们应用反式-C 系统地检验了以下假设:由相同反式作用元件(即转录或剪接因子)核心调节的基因在三维空间共定位,形成 "RNA 工厂",最大限度地提高 RNA 生物发生的效率和准确性。我们发现,许多具有相同 DNA 结合蛋白的多个结合位点的基因座在反式中相互影响,尤其是那些由具有内在无序结构域的因子结合的基因座。同样,一部分 RNA 结合蛋白的集群结合也与编码基因座的反式相互作用有关。有趣的是,我们观察到这些反式相互作用的基因座靠近核斑点。我们的研究结果支持由 RNA 生物发生驱动的反式相互作用染色质域(TIDs)的存在。Trans-C 为研究这些和其他类型的反式相互作用提供了一个高效的计算框架,从而为研究基因组结构中一个鲜为人知的方面提供了可能。
{"title":"Systematic identification of interchromosomal interaction networks supports the existence of specialized RNA factories","authors":"Borislav H Hristov, William Stafford Noble, Alessandro Bertero","doi":"10.1101/gr.278327.123","DOIUrl":"https://doi.org/10.1101/gr.278327.123","url":null,"abstract":"Most studies of genome organization have focused on intrachromosomal (<em>cis</em>) contacts because they harbor key features such as DNA loops and topologically associating domains. Interchromosomal (<em>trans</em>) contacts have received much less attention, and tools for interrogating potential biologically relevant <em>trans</em> structures are lacking. Here, we develop a computational framework that uses Hi-C data to identify sets of loci that jointly interact in <em>trans</em>. This method, trans-C, initiates probabilistic random walks with restarts from a set of seed loci to traverse an input Hi-C contact network, thereby identifying sets of <em>trans</em>-contacting loci. We validate trans-C in three increasingly complex models of established <em>trans</em> contacts: the <em>Plasmodium falciparum</em> <em>var</em> genes, the mouse olfactory receptor \"Greek islands\", and the human RBM20 cardiac splicing factory. We then apply trans-C to systematically test the hypothesis that genes coregulated by the same <em>trans</em>-acting element (i.e., a transcription or splicing factor) colocalize in three dimensions to form \"RNA factories\" that maximize the efficiency and accuracy of RNA biogenesis. We find that many loci with multiple binding sites of the same DNA binding proteins interact with one another in <em>trans</em>, especially those bound by factors with intrinsically disordered domains. Similarly, clustered binding of a subset of RNA-binding proteins correlates with <em>trans</em> interaction of the encoding loci. Intriguingly, we observe that these <em>trans</em>-interacting loci are close to nuclear speckles. Our findings support the existence of <em>trans</em> interacting chromatin domains (TIDs) driven by RNA biogenesis. Trans-C provides an efficient computational framework for studying these and other types of <em>trans</em> interactions, empowering studies of a poorly understood aspect of genome architecture.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142321734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Genome research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1