Wenyu Zhang, Anja Guenther, Yuanxiao Gao, Kristian Ullrich, Bruno Huettel, Aftab Ahmad, Lei Duan, Kaizong Wei, Diethard Tautz
The ability to generate multiple RNA transcript isoforms from the same gene is a general phenomenon in eukaryotes. However, the complexity and diversity of alternative isoforms in natural populations remain largely unexplored. Using a newly developed full-length transcripts enrichment protocol with 5' CAP selection, we sequenced full-length RNA transcripts of 48 individuals from outbred populations and subspecies of Mus musculus, and from the closely related sister species Mus spretus and Mus spicilegus as outgroups. The dataset represents the most extensive full-length high-quality isoform catalog at the population level to date. In total, we reliably identified 117,728 distinct isoforms, of which only 51% were previously annotated. We show that the population-specific distribution pattern of isoforms is phylogenetically informative and reflects the segregating SNP diversity between the populations. We find that ancient housekeeping genes are a major source of the overall isoform diversity, and that the generation of alternative first exons plays a major role in generating new isoforms. Given that our data allow us to distinguish between population-specific isoforms and isoforms that are conserved across multiple populations, it is possible to refine the annotation of the reference mouse genome to a set of about 40,000 isoforms that should be most relevant for comparative functional analysis across species.
{"title":"Full-length RNA transcript sequencing traces brain isoform diversity in house mouse natural populations","authors":"Wenyu Zhang, Anja Guenther, Yuanxiao Gao, Kristian Ullrich, Bruno Huettel, Aftab Ahmad, Lei Duan, Kaizong Wei, Diethard Tautz","doi":"10.1101/gr.279166.124","DOIUrl":"https://doi.org/10.1101/gr.279166.124","url":null,"abstract":"The ability to generate multiple RNA transcript isoforms from the same gene is a general phenomenon in eukaryotes. However, the complexity and diversity of alternative isoforms in natural populations remain largely unexplored. Using a newly developed full-length transcripts enrichment protocol with 5' CAP selection, we sequenced full-length RNA transcripts of 48 individuals from outbred populations and subspecies of <em>Mus musculus</em>, and from the closely related sister species <em>Mus spretus</em> and <em>Mus spicilegus</em> as outgroups. The dataset represents the most extensive full-length high-quality isoform catalog at the population level to date. In total, we reliably identified 117,728 distinct isoforms, of which only 51% were previously annotated. We show that the population-specific distribution pattern of isoforms is phylogenetically informative and reflects the segregating SNP diversity between the populations. We find that ancient housekeeping genes are a major source of the overall isoform diversity, and that the generation of alternative first exons plays a major role in generating new isoforms. Given that our data allow us to distinguish between population-specific isoforms and isoforms that are conserved across multiple populations, it is possible to refine the annotation of the reference mouse genome to a set of about 40,000 isoforms that should be most relevant for comparative functional analysis across species.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"186 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142236238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joshua Lee, Elizabeth A Snell, Joanne Brown, Charlotte Elizabeth Booth, Rosamonde E Banks, Daniel J Turner, Naveen Vasudev, Dimitris Lagos
The use of long-read direct RNA sequencing (DRS) and PCR cDNA sequencing (PCS) in clinical oncology remains limited, with no direct comparison between the two methods. We used DRS and PCS to study clear cell renal cell carcinoma (ccRCC), focussing on new transcript and gene discovery. Twelve primary ccRCC archival tumors, six from patients who went on to relapse, were analysed. Results were validated in an independent cohort of twenty patients by qRT-PCR and compared to DRS analysis of RCC4 cells. In archival clinical samples and due to long-term storage, average read length was lower (400-500nt) than that achieved through DRS of RCC4 cells (>1100nt). Still, deconvolution analysis showed a loss of immune infiltrate in primary tumors of patients who relapse as reported by others. Differentially expressed genes in patients who went on to relapse were determined with good overlap between DRS and PCS, identifying LINC04216 and the T cell exhaustion marker TOX as novel candidate recurrence-associated genes. Novel transcript analysis revealed over 10,000 candidate novel transcripts detected by both methods and in ccRCC cells in vitro, including a novel CD274 (PD-L1) transcript encoding for the soluble version of the protein with a longer 3' UTR and lower stability than the annotated transcript. Both methods identified 414 novel genes, also detected in RCC4 cells, including a novel noncoding gene over-expressed in patients who relapse. Overall, we showcase use of PCS and DRS in archival tumor samples to uncover unmapped features of cancer transcriptomes, linked to disease progression and immune evasion.
{"title":"Long-read RNA sequencing of archival tissues reveals novel genes and transcripts associated with clear cell renal cell carcinoma recurrence and immune evasion","authors":"Joshua Lee, Elizabeth A Snell, Joanne Brown, Charlotte Elizabeth Booth, Rosamonde E Banks, Daniel J Turner, Naveen Vasudev, Dimitris Lagos","doi":"10.1101/gr.278801.123","DOIUrl":"https://doi.org/10.1101/gr.278801.123","url":null,"abstract":"The use of long-read direct RNA sequencing (DRS) and PCR cDNA sequencing (PCS) in clinical oncology remains limited, with no direct comparison between the two methods. We used DRS and PCS to study clear cell renal cell carcinoma (ccRCC), focussing on new transcript and gene discovery. Twelve primary ccRCC archival tumors, six from patients who went on to relapse, were analysed. Results were validated in an independent cohort of twenty patients by qRT-PCR and compared to DRS analysis of RCC4 cells. In archival clinical samples and due to long-term storage, average read length was lower (400-500nt) than that achieved through DRS of RCC4 cells (>1100nt). Still, deconvolution analysis showed a loss of immune infiltrate in primary tumors of patients who relapse as reported by others. Differentially expressed genes in patients who went on to relapse were determined with good overlap between DRS and PCS, identifying <em>LINC04216</em> and the T cell exhaustion marker <em>TOX</em> as novel candidate recurrence-associated genes. Novel transcript analysis revealed over 10,000 candidate novel transcripts detected by both methods and in ccRCC cells in vitro, including a novel <em>CD274</em> (<em>PD-L1</em>) transcript encoding for the soluble version of the protein with a longer 3' UTR and lower stability than the annotated transcript. Both methods identified 414 novel genes, also detected in RCC4 cells, including a novel noncoding gene over-expressed in patients who relapse. Overall, we showcase use of PCS and DRS in archival tumor samples to uncover unmapped features of cancer transcriptomes, linked to disease progression and immune evasion.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"63 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142235259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sena A Gocuk, James Lancaster, Shian Su, Jasleen K Jolly, Thomas L Edwards, Doron G Hickey, Matthew E Ritchie, Marnie E Blewitt, Lauren N Ayton, Quentin Gouil
X-linked genetic disorders typically affect females less severely than males due to the presence of a second X Chromosome not carrying the deleterious variant. However, the phenotypic expression in females is highly variable, which may be explained by an allelic skew in X-Chromosome inactivation. Accurate measurement of X inactivation skew is crucial to understand and predict disease phenotype in carrier females, with prediction especially relevant for degenerative conditions. We propose a novel approach using nanopore sequencing to quantify skewed X inactivation accurately. By phasing sequence variants and methylation patterns, this single assay reveals the disease variant, X inactivation skew, its directionality, and is applicable to all patients and X-linked variants. Enrichment of X Chromosome reads through adaptive sampling enhances cost-efficiency. Our study includes a cohort of 16 X-linked variant carrier females affected by two X-linked inherited retinal diseases: choroideremia and RPGR-associated retinitis pigmentosa. As retinal DNA cannot be readily obtained, we instead determine the skew from peripheral samples (blood, saliva and buccal mucosa), and correlate it to phenotypic outcomes. This revealed a strong correlation between X inactivation skew and disease presentation, confirming the value in performing this assay and its potential as a way to prioritise patients for early intervention, such as gene therapy currently in clinical trials for these conditions. Our method of assessing skewed X inactivation is applicable to all long-read genomic datasets, providing insights into disease risk and severity and aiding in the development of individualised strategies for X-linked variant carrier females.
X 连锁遗传病对女性的影响通常不如男性严重,这是因为女性体内存在第二个不携带有害变异体的 X 染色体。然而,女性的表型表现却千差万别,这可能与 X 染色体失活的等位基因偏斜有关。精确测量 X 染色体失活偏斜对了解和预测携带者女性的疾病表型至关重要,尤其是对退行性疾病的预测。我们提出了一种利用纳米孔测序准确量化X失活偏斜的新方法。通过对序列变异和甲基化模式进行相位分析,这种单一检测方法可揭示疾病变异、X 失活偏斜及其方向性,并适用于所有患者和 X 连锁变异。通过自适应采样丰富 X 染色体读数可提高成本效益。我们的研究包括16名X连锁变异携带者女性,她们患有两种X连锁遗传性视网膜疾病:脉络膜血症和RPGR相关性色素性视网膜炎。由于无法轻易获得视网膜 DNA,我们转而从外周样本(血液、唾液和口腔粘膜)中确定偏斜度,并将其与表型结果相关联。结果表明,X 失活偏斜与疾病表现之间存在很强的相关性,这证实了进行这种检测的价值,以及它作为一种优先考虑对患者进行早期干预的方法的潜力,例如目前正在对这些疾病进行临床试验的基因疗法。我们的 X 失活偏斜评估方法适用于所有长读取基因组数据集,可帮助了解疾病风险和严重程度,并有助于为 X 连锁变异携带女性制定个体化策略。
{"title":"Measuring X inactivation skew for X-linked diseases with adaptive nanopore sequencing","authors":"Sena A Gocuk, James Lancaster, Shian Su, Jasleen K Jolly, Thomas L Edwards, Doron G Hickey, Matthew E Ritchie, Marnie E Blewitt, Lauren N Ayton, Quentin Gouil","doi":"10.1101/gr.279396.124","DOIUrl":"https://doi.org/10.1101/gr.279396.124","url":null,"abstract":"X-linked genetic disorders typically affect females less severely than males due to the presence of a second X Chromosome not carrying the deleterious variant. However, the phenotypic expression in females is highly variable, which may be explained by an allelic skew in X-Chromosome inactivation. Accurate measurement of X inactivation skew is crucial to understand and predict disease phenotype in carrier females, with prediction especially relevant for degenerative conditions. We propose a novel approach using nanopore sequencing to quantify skewed X inactivation accurately. By phasing sequence variants and methylation patterns, this single assay reveals the disease variant, X inactivation skew, its directionality, and is applicable to all patients and X-linked variants. Enrichment of X Chromosome reads through adaptive sampling enhances cost-efficiency. Our study includes a cohort of 16 X-linked variant carrier females affected by two X-linked inherited retinal diseases: choroideremia and RPGR-associated retinitis pigmentosa. As retinal DNA cannot be readily obtained, we instead determine the skew from peripheral samples (blood, saliva and buccal mucosa), and correlate it to phenotypic outcomes. This revealed a strong correlation between X inactivation skew and disease presentation, confirming the value in performing this assay and its potential as a way to prioritise patients for early intervention, such as gene therapy currently in clinical trials for these conditions. Our method of assessing skewed X inactivation is applicable to all long-read genomic datasets, providing insights into disease risk and severity and aiding in the development of individualised strategies for X-linked variant carrier females.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"8 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142235258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alicja Pacholewska, Matthias Lienhard, Mirko Brueggemann, Heike Haenel, Lorina Bilalli, Anja Koenigs, Felix Hess, Kerstin Becker, Karl Koehrer, Jesko Fabian Kaiser, Holger Gohlke, Norbert Gattermann, Michael Hallek, Carmen Diana Herling, Julian Koenig, Christina Grimm, Ralf Herwig, Kathi Zarnack, Michal R. Schweiger
Mutations in splicing factor 3B subunit 1 (SF3B1) frequently occur in patients with chronic lymphocytic leukemia (CLL) and myelodysplastic syndromes (MDS). These mutations have different effects on the disease prognosis with beneficial effect in MDS and worse prognosis in CLL patients. A full-length transcriptome approach can expand our knowledge on SF3B1 mutation effects on RNA splicing and its contribution to patient survival and treatment options. We applied long-read transcriptome sequencing (LRTS) to 44 MDS and CLL patients, as well as two pairs of isogenic cell lines with and without SF3B1 mutations, and found >60% of novel isoforms. Splicing alterations were largely shared between cancer types and specifically affected the usage of introns and 3’ splice sites. Our data highlighted a constrained window at canonical 3’ splice sites in which dynamic splice site switches occurred in SF3B1-mutated patients. Using transcriptome-wide RNA binding maps and molecular dynamics simulations, we showed multimodal SF3B1 binding at 3’ splice sites and predicted reduced RNA binding at the second binding pocket of SF3B1K700E. Our work presents the hitherto most complete LRTS study of the SF3B1 mutation in CLL and MDS and provides a resource to study aberrant splicing in cancer. Moreover, we showed that different disease prognosis most likely results from the different cell types expanded during carcinogenesis rather than different mechanisms of action of the mutated SF3B1. These results have important implications for understanding the role of SF3B1 mutations in hematological malignancies and other related diseases.
{"title":"Long-read transcriptome sequencing of CLL and MDS patients uncovers molecular effects of SF3B1 mutations","authors":"Alicja Pacholewska, Matthias Lienhard, Mirko Brueggemann, Heike Haenel, Lorina Bilalli, Anja Koenigs, Felix Hess, Kerstin Becker, Karl Koehrer, Jesko Fabian Kaiser, Holger Gohlke, Norbert Gattermann, Michael Hallek, Carmen Diana Herling, Julian Koenig, Christina Grimm, Ralf Herwig, Kathi Zarnack, Michal R. Schweiger","doi":"10.1101/gr.279327.124","DOIUrl":"https://doi.org/10.1101/gr.279327.124","url":null,"abstract":"Mutations in splicing factor 3B subunit 1 (<em>SF3B1</em>) frequently occur in patients with chronic lymphocytic leukemia (CLL) and myelodysplastic syndromes (MDS). These mutations have different effects on the disease prognosis with beneficial effect in MDS and worse prognosis in CLL patients. A full-length transcriptome approach can expand our knowledge on <em>SF3B1</em> mutation effects on RNA splicing and its contribution to patient survival and treatment options. We applied long-read transcriptome sequencing (LRTS) to 44 MDS and CLL patients, as well as two pairs of isogenic cell lines with and without <em>SF3B1</em> mutations, and found >60% of novel isoforms. Splicing alterations were largely shared between cancer types and specifically affected the usage of introns and 3’ splice sites. Our data highlighted a constrained window at canonical 3’ splice sites in which dynamic splice site switches occurred in <em>SF3B1</em>-mutated patients. Using transcriptome-wide RNA binding maps and molecular dynamics simulations, we showed multimodal SF3B1 binding at 3’ splice sites and predicted reduced RNA binding at the second binding pocket of SF3B1<sup>K700E</sup>. Our work presents the hitherto most complete LRTS study of the <em>SF3B1</em> mutation in CLL and MDS and provides a resource to study aberrant splicing in cancer. Moreover, we showed that different disease prognosis most likely results from the different cell types expanded during carcinogenesis rather than different mechanisms of action of the mutated <em>SF3B1</em>. These results have important implications for understanding the role of <em>SF3B1</em> mutations in hematological malignancies and other related diseases.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"5 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142231561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gregor Diensthuber, Leszek P Pryszcz, Laia Llovera, Morghan C Lucas, Anna Delgado-Tejedor, Sonia Cruciani, Jean-Yves Roignant, Oguzhan Begik, Eva Maria Novoa
In recent years, nanopore direct RNA sequencing (DRS) became a valuable tool for studying the epitranscriptome, due to its ability to detect multiple modifications within the same full-length native RNA molecules. While RNA modifications can be identified in the form of systematic basecalling 'errors' in DRS datasets, N6-methyladenosine (m6A) modifications produce relatively low 'errors' compared to other RNA modifications, limiting the applicability of this approach to m6A sites that are modified at high stoichiometries. Here, we demonstrate that the use of alternative RNA basecalling models, trained with fully unmodified sequences, increases the 'error'signal of m6A, leading to enhanced detection and improved sensitivity even at low stoichiometries. Moreover, we find that high-accuracy alternative RNA basecalling models can show up to 97% median basecalling accuracy, outperforming currently available RNA basecalling models, which show 91% median basecalling accuracy. Notably, the use of high-accuracy basecalling models is accompanied by a significant increase in the number of mapped reads –especially in shorter RNA fractions– and increased basecalling error signatures at pseudouridine (Ψ) and N1-methylpseudouridine (m1Ψ) modified sites. Overall, our work demonstrates that alternative RNA basecalling models can be used to improve the detection of RNA modifications, read mappability, and basecalling accuracy in nanopore DRS datasets.
{"title":"Enhanced detection of RNA modifications and read mapping with high-accuracy nanopore RNA basecalling models","authors":"Gregor Diensthuber, Leszek P Pryszcz, Laia Llovera, Morghan C Lucas, Anna Delgado-Tejedor, Sonia Cruciani, Jean-Yves Roignant, Oguzhan Begik, Eva Maria Novoa","doi":"10.1101/gr.278849.123","DOIUrl":"https://doi.org/10.1101/gr.278849.123","url":null,"abstract":"In recent years, nanopore direct RNA sequencing (DRS) became a valuable tool for studying the epitranscriptome, due to its ability to detect multiple modifications within the same full-length native RNA molecules. While RNA modifications can be identified in the form of systematic basecalling 'errors' in DRS datasets, N6-methyladenosine (m6A) modifications produce relatively low 'errors' compared to other RNA modifications, limiting the applicability of this approach to m6A sites that are modified at high stoichiometries. Here, we demonstrate that the use of alternative RNA basecalling models, trained with fully unmodified sequences, increases the 'error'signal of m6A, leading to enhanced detection and improved sensitivity even at low stoichiometries. Moreover, we find that high-accuracy alternative RNA basecalling models can show up to 97% median basecalling accuracy, outperforming currently available RNA basecalling models, which show 91% median basecalling accuracy. Notably, the use of high-accuracy basecalling models is accompanied by a significant increase in the number of mapped reads –especially in shorter RNA fractions– and increased basecalling error signatures at pseudouridine (Ψ) and N1-methylpseudouridine (m1Ψ) modified sites. Overall, our work demonstrates that alternative RNA basecalling models can be used to improve the detection of RNA modifications, read mappability, and basecalling accuracy in nanopore DRS datasets.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"63 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142231560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Suleyman Gulsuner, Amal AbuRayyan, Jessica B. Mandell, Ming K. Lee, Greta V. Bernier, Barbara M. Norquist, Sarah B. Pierce, Mary-Claire King, Tom Walsh
The vast majority of deeply intronic genomic variants are benign, but some extremely rare or private deep intronic variants lead to exonification of intronic sequence with abnormal transcriptional consequences. Damaging variants of this class are likely underreported as causes of disease for several reasons: Most clinical DNA and RNA testing does not include full intronic sequences; many of these variants lie in complex repetitive regions that cannot be aligned from short-read whole-genome sequence; and, until recently, consequences of deep intronic variants were not accurately predicted by in silico tools. We evaluated the frequency and consequences of rare deep intronic variants for families severely affected with breast, ovarian, pancreatic, and/or metastatic prostate cancer, but with no causal variant identified by any previous genomic or cDNA-based approach. For 10 tumor-suppressor genes, we used multiplexed adaptive sampling long-read DNA sequencing and cDNA sequencing, based on patient-derived DNA and RNA, to systematically evaluate deep intronic variation. We identified all variants across the full genomic loci of targeted genes, applied the in silico tools SpliceAI and Pangolin to predict variants of functional consequence, and then carried out long-read cDNA sequencing to identify aberrant transcripts. For eight of the 120 (6%) previously unsolved families, rare deep intronic variants in BRCA1, PALB2, and ATM create intronic pseudoexons that are spliced into transcripts, leading to premature truncations. These results suggest that long-read DNA and cDNA sequencing can be integrated into variant discovery, with strategies for accurately characterizing pathogenic variants.
{"title":"Long-read DNA and cDNA sequencing identify cancer-predisposing deep intronic variation in tumor-suppressor genes","authors":"Suleyman Gulsuner, Amal AbuRayyan, Jessica B. Mandell, Ming K. Lee, Greta V. Bernier, Barbara M. Norquist, Sarah B. Pierce, Mary-Claire King, Tom Walsh","doi":"10.1101/gr.279158.124","DOIUrl":"https://doi.org/10.1101/gr.279158.124","url":null,"abstract":"The vast majority of deeply intronic genomic variants are benign, but some extremely rare or private deep intronic variants lead to exonification of intronic sequence with abnormal transcriptional consequences. Damaging variants of this class are likely underreported as causes of disease for several reasons: Most clinical DNA and RNA testing does not include full intronic sequences; many of these variants lie in complex repetitive regions that cannot be aligned from short-read whole-genome sequence; and, until recently, consequences of deep intronic variants were not accurately predicted by in silico tools. We evaluated the frequency and consequences of rare deep intronic variants for families severely affected with breast, ovarian, pancreatic, and/or metastatic prostate cancer, but with no causal variant identified by any previous genomic or cDNA-based approach. For 10 tumor-suppressor genes, we used multiplexed adaptive sampling long-read DNA sequencing and cDNA sequencing, based on patient-derived DNA and RNA, to systematically evaluate deep intronic variation. We identified all variants across the full genomic loci of targeted genes, applied the in silico tools SpliceAI and Pangolin to predict variants of functional consequence, and then carried out long-read cDNA sequencing to identify aberrant transcripts. For eight of the 120 (6%) previously unsolved families, rare deep intronic variants in <em>BRCA1</em>, <em>PALB2</em>, and <em>ATM</em> create intronic pseudoexons that are spliced into transcripts, leading to premature truncations. These results suggest that long-read DNA and cDNA sequencing can be integrated into variant discovery, with strategies for accurately characterizing pathogenic variants.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"74 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142231562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Laura E. Tibbs-Cortes, Tingting Guo, Carson M. Andorf, Xianran Li, Jianming Yu
Maize phenotypes are plastic, determined by the complex interplay of genetics and environmental variables. Uncovering the genes responsible and understanding how their effects change across a large geographic region are challenging. In this study, we conducted systematic analysis to identify environmental indices that strongly influence 19 traits (including flowering time, plant architecture, and yield component traits) measured in the maize nested association mapping (NAM) population grown in 11 environments. Identified environmental indices based on day length, temperature, moisture, and combinations of these are biologically meaningful. Next, we leveraged a total of more than 20 million SNP and SV markers derived from recent de novo sequencing of the NAM founders for trait prediction and dissection. When combined with identified environmental indices, genomic prediction enables accurate performance predictions. Genome-wide association studies (GWASs) detected genetic loci associated with the plastic response to the identified environmental indices for all examined traits. By systematically uncovering the major environmental and genomic factors underlying phenotypic plasticity in a wide variety of traits and depositing our results as a track on the MaizeGDB genome browser, we provide a community resource as well as a comprehensive analytical framework to facilitate continuing complex trait dissection and prediction in maize and other crops. Our findings also provide a conceptual framework for the genetic architecture of phenotypic plasticity by accommodating two alternative models, regulatory gene model and allelic sensitivity model, as special cases of a continuum.
玉米的表型具有可塑性,由遗传和环境变量的复杂相互作用决定。揭示相关基因并了解其影响如何在一个大的地理区域内发生变化是一项挑战。在这项研究中,我们进行了系统分析,以确定对在 11 种环境中生长的玉米嵌套关联图谱(NAM)群体测量的 19 个性状(包括开花时间、植株结构和产量成分性状)有强烈影响的环境指数。所确定的环境指数基于日长、温度、湿度以及这些指数的组合,具有生物学意义。接下来,我们利用最近对 NAM 创始者进行从头测序得到的总计超过 2,000 万个 SNP 和 SV 标记进行性状预测和分析。结合已确定的环境指数,基因组预测可实现准确的性能预测。全基因组关联研究(GWAS)发现了与所有受检性状对已确定环境指数的可塑性响应相关的基因位点。通过系统地揭示各种性状表型可塑性的主要环境和基因组因素,并将我们的研究成果作为一个轨道存放在 MaizeGDB 基因组浏览器上,我们提供了一个社区资源和一个全面的分析框架,以促进对玉米和其他作物复杂性状的持续分析和预测。我们的研究结果还为表型可塑性的遗传结构提供了一个概念框架,将两种可选模型(调控基因模型和等位基因敏感性模型)作为连续体的特例。
{"title":"Comprehensive identification of genomic and environmental determinants of phenotypic plasticity in maize","authors":"Laura E. Tibbs-Cortes, Tingting Guo, Carson M. Andorf, Xianran Li, Jianming Yu","doi":"10.1101/gr.279027.124","DOIUrl":"https://doi.org/10.1101/gr.279027.124","url":null,"abstract":"Maize phenotypes are plastic, determined by the complex interplay of genetics and environmental variables. Uncovering the genes responsible and understanding how their effects change across a large geographic region are challenging. In this study, we conducted systematic analysis to identify environmental indices that strongly influence 19 traits (including flowering time, plant architecture, and yield component traits) measured in the maize nested association mapping (NAM) population grown in 11 environments. Identified environmental indices based on day length, temperature, moisture, and combinations of these are biologically meaningful. Next, we leveraged a total of more than 20 million SNP and SV markers derived from recent de novo sequencing of the NAM founders for trait prediction and dissection. When combined with identified environmental indices, genomic prediction enables accurate performance predictions. Genome-wide association studies (GWASs) detected genetic loci associated with the plastic response to the identified environmental indices for all examined traits. By systematically uncovering the major environmental and genomic factors underlying phenotypic plasticity in a wide variety of traits and depositing our results as a track on the MaizeGDB genome browser, we provide a community resource as well as a comprehensive analytical framework to facilitate continuing complex trait dissection and prediction in maize and other crops. Our findings also provide a conceptual framework for the genetic architecture of phenotypic plasticity by accommodating two alternative models, regulatory gene model and allelic sensitivity model, as special cases of a continuum.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"32 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142231609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jan Müller, Christina Hartwig, Mirko Sonntag, Lisa Bitzer, Christopher Adelmann, Yevhen Vainshtein, Karolina Glanz, Sebastian O. Decker, Thorsten Brenner, Georg F. Weber, Arndt von Haeseler, Kai Sohn
Here, we present a method for enrichment of double-stranded cfDNA with an average length of ∼40 bp from cfDNA for high-throughput DNA sequencing. This class of cfDNA is enriched at gene promoters and binding sites of transcription factors or structural DNA-binding proteins, so that a genome-wide DNA footprint is directly captured from liquid biopsies. In short double-stranded cfDNA from healthy individuals, we find significant enrichment of 203 transcription factor motifs. Additionally, short double-stranded cfDNA signals at specific genomic regions correlate negatively with DNA methylation, positively with H3K4me3 histone modifications and gene transcription. The diagnostic potential of short double-stranded cell-free DNA (cfDNA) in blood plasma has not yet been recognized. When comparing short double-stranded cfDNA from patient samples of pancreatic ductal adenocarcinoma with colorectal carcinoma or septic with postoperative controls, we identify 136 and 241 differentially enriched loci, respectively. Using these differentially enriched loci, the disease types can be clearly distinguished by principal component analysis, demonstrating the diagnostic potential of short double-stranded cfDNA signals as a new class of biomarkers for liquid biopsies.
在此,我们介绍一种从 cfDNA 中富集平均长度为 40 bp 的双链 cfDNA 的方法,用于高通量 DNA 测序。这类 cfDNA 富集在基因启动子和转录因子或结构 DNA 结合蛋白的结合位点,因此可以直接从液体活检组织中捕获全基因组的 DNA 足印。在来自健康人的短双链 cfDNA 中,我们发现 203 个转录因子基序显著富集。此外,特定基因组区域的短双链 cfDNA 信号与 DNA 甲基化呈负相关,与 H3K4me3 组蛋白修饰和基因转录呈正相关。血浆中短双链无细胞 DNA(cfDNA)的诊断潜力尚未得到认可。在比较胰腺导管腺癌、结直肠癌或败血症患者样本与术后对照组样本中的短双链 cfDNA 时,我们分别发现了 136 个和 241 个不同的富集位点。利用这些不同的富集位点,可以通过主成分分析清楚地区分疾病类型,这证明了短双链 cfDNA 信号作为液体活检的一类新生物标记物的诊断潜力。
{"title":"A novel approach for in vivo DNA footprinting using short double-stranded cell-free DNA from plasma","authors":"Jan Müller, Christina Hartwig, Mirko Sonntag, Lisa Bitzer, Christopher Adelmann, Yevhen Vainshtein, Karolina Glanz, Sebastian O. Decker, Thorsten Brenner, Georg F. Weber, Arndt von Haeseler, Kai Sohn","doi":"10.1101/gr.279326.124","DOIUrl":"https://doi.org/10.1101/gr.279326.124","url":null,"abstract":"Here, we present a method for enrichment of double-stranded cfDNA with an average length of ∼40 bp from cfDNA for high-throughput DNA sequencing. This class of cfDNA is enriched at gene promoters and binding sites of transcription factors or structural DNA-binding proteins, so that a genome-wide DNA footprint is directly captured from liquid biopsies. In short double-stranded cfDNA from healthy individuals, we find significant enrichment of 203 transcription factor motifs. Additionally, short double-stranded cfDNA signals at specific genomic regions correlate negatively with DNA methylation, positively with H3K4me3 histone modifications and gene transcription. The diagnostic potential of short double-stranded cell-free DNA (cfDNA) in blood plasma has not yet been recognized. When comparing short double-stranded cfDNA from patient samples of pancreatic ductal adenocarcinoma with colorectal carcinoma or septic with postoperative controls, we identify 136 and 241 differentially enriched loci, respectively. Using these differentially enriched loci, the disease types can be clearly distinguished by principal component analysis, demonstrating the diagnostic potential of short double-stranded cfDNA signals as a new class of biomarkers for liquid biopsies.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"71 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142231563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zane Kliesmete, Peter Orchard, Victor Yan Kin Lee, Johanna Geuder, Simon M. Krauß, Mari Ohnuki, Jessica Jocher, Beate Vieth, Wolfgang Enard, Ines Hellmann
Pleiotropy, measured as expression breadth across tissues, is one of the best predictors for protein sequence and expression conservation. In this study, we investigated its effect on the evolution of cis-regulatory elements (CREs). To this end, we carefully reanalyzed the Epigenomics Roadmap data for nine fetal tissues, assigning a measure of pleiotropic degree to nearly half a million CREs. To assess the functional conservation of CREs, we generated ATAC-seq and RNA-seq data from humans and macaques. We found that more pleiotropic CREs exhibit greater conservation in accessibility, and the mRNA expression levels of the associated genes are more conserved. This trend of higher conservation for higher degrees of pleiotropy persists when analyzing the transcription factor binding repertoire. In contrast, simple DNA sequence conservation of orthologous sites between species tends to be even lower for pleiotropic CREs than for species-specific CREs. Combining various lines of evidence, we propose that the lack of sequence conservation in functionally conserved pleiotropic CREs is due to within-element compensatory evolution. In summary, our findings suggest that pleiotropy is also a good predictor for the functional conservation of CREs, even though this is not reflected in the sequence conservation of pleiotropic CREs.
{"title":"Evidence for compensatory evolution within pleiotropic regulatory elements","authors":"Zane Kliesmete, Peter Orchard, Victor Yan Kin Lee, Johanna Geuder, Simon M. Krauß, Mari Ohnuki, Jessica Jocher, Beate Vieth, Wolfgang Enard, Ines Hellmann","doi":"10.1101/gr.279001.124","DOIUrl":"https://doi.org/10.1101/gr.279001.124","url":null,"abstract":"Pleiotropy, measured as expression breadth across tissues, is one of the best predictors for protein sequence and expression conservation. In this study, we investigated its effect on the evolution of <em>cis</em>-regulatory elements (CREs). To this end, we carefully reanalyzed the Epigenomics Roadmap data for nine fetal tissues, assigning a measure of pleiotropic degree to nearly half a million CREs. To assess the functional conservation of CREs, we generated ATAC-seq and RNA-seq data from humans and macaques. We found that more pleiotropic CREs exhibit greater conservation in accessibility, and the mRNA expression levels of the associated genes are more conserved. This trend of higher conservation for higher degrees of pleiotropy persists when analyzing the transcription factor binding repertoire. In contrast, simple DNA sequence conservation of orthologous sites between species tends to be even lower for pleiotropic CREs than for species-specific CREs. Combining various lines of evidence, we propose that the lack of sequence conservation in functionally conserved pleiotropic CREs is due to within-element compensatory evolution. In summary, our findings suggest that pleiotropy is also a good predictor for the functional conservation of CREs, even though this is not reflected in the sequence conservation of pleiotropic CREs.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"204 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142160429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shujun Ou, Armin Scheben, Tyler Collins, Yinjie Qiu, Arun S. Seetharam, Claire C. Menard, Nancy Manchanda, Jonathan I. Gent, Michael C. Schatz, Sarah N. Anderson, Matthew B. Hufford, Candice N. Hirsch
Much of the profound interspecific variation in genome content has been attributed to transposable elements (TEs). To explore the extent of TE variation within species, we developed an optimized open-source algorithm, panEDTA, to de novo annotate TEs in a pangenome context. We then generated a unified TE annotation for a maize pangenome derived from 26 reference-quality genomes, which reveals an excess of 35.1 Mb of TE sequences per genome in tropical maize relative to temperate maize. A small number (n = 216) of TE families, mainly LTR retrotransposons, drive these differences. Evidence from the methylome, transcriptome, LTR age distribution, and LTR insertional polymorphisms reveals that 64.7% of the variability is contributed by LTR families that are young, less methylated, and more expressed in tropical maize, whereas 18.5% is driven by LTR families with removal or loss in temperate maize. Additionally, we find enrichment for Young LTR families adjacent to nucleotide-binding and leucine-rich repeat (NLR) clusters of varying copy number across lines, suggesting TE activity may be associated with disease resistance in maize.
转座元件(TEs)在基因组内容上的深刻种间差异中占了很大一部分。为了探索物种内 TE 的变异程度,我们开发了一种优化的开源算法--panEDTA,用于在泛基因组背景下重新注释 TE。然后,我们为来自26个参考质量基因组的玉米泛基因组生成了统一的TE注释,发现热带玉米每个基因组的TE序列比温带玉米多35.1 Mb。这些差异是由少数(n = 216)TE 家族(主要是 LTR 反转座子)造成的。来自甲基组、转录组、LTR年龄分布和LTR插入多态性的证据显示,64.7%的变异是由热带玉米中年轻、甲基化较少和表达较多的LTR家族贡献的,而18.5%的变异是由温带玉米中移除或丢失的LTR家族驱动的。此外,我们还发现,年轻的 LTR 家族富集在各品系拷贝数不同的核苷酸结合和富亮氨酸重复(NLR)簇附近,这表明 TE 活性可能与玉米的抗病性有关。
{"title":"Differences in activity and stability drive transposable element variation in tropical and temperate maize","authors":"Shujun Ou, Armin Scheben, Tyler Collins, Yinjie Qiu, Arun S. Seetharam, Claire C. Menard, Nancy Manchanda, Jonathan I. Gent, Michael C. Schatz, Sarah N. Anderson, Matthew B. Hufford, Candice N. Hirsch","doi":"10.1101/gr.278131.123","DOIUrl":"https://doi.org/10.1101/gr.278131.123","url":null,"abstract":"Much of the profound interspecific variation in genome content has been attributed to transposable elements (TEs). To explore the extent of TE variation within species, we developed an optimized open-source algorithm, panEDTA, to de novo annotate TEs in a pangenome context. We then generated a unified TE annotation for a maize pangenome derived from 26 reference-quality genomes, which reveals an excess of 35.1 Mb of TE sequences per genome in tropical maize relative to temperate maize. A small number (<em>n</em> = 216) of TE families, mainly LTR retrotransposons, drive these differences. Evidence from the methylome, transcriptome, LTR age distribution, and LTR insertional polymorphisms reveals that 64.7% of the variability is contributed by LTR families that are young, less methylated, and more expressed in tropical maize, whereas 18.5% is driven by LTR families with removal or loss in temperate maize. Additionally, we find enrichment for Young LTR families adjacent to nucleotide-binding and leucine-rich repeat (NLR) clusters of varying copy number across lines, suggesting TE activity may be associated with disease resistance in maize.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"9 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142160427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}