Mayank Murali, Jamie Saquing, Senbao Lu, Ziyang Gao, Emily F. Watts, Ben Jordan, Zachary Peters Wakefield, Ana Fiszbein, David R. Cooper, Peter J. Castaldi, Dmitry Korkin, Gloria M. Sheynkman
Long-read RNA-seq has shed light on transcriptomic complexity, but questions remain about the functionality of downstream protein products. We introduce Biosurfer, a computational approach for comparing protein isoforms, while systematically tracking the transcriptional, splicing, and translational variations that underlie differences in the sequences of the protein products. Using Biosurfer, we analyzed the differences in 35,082 pairs of GENCODE annotated protein isoforms, finding a majority (70%) of variable N-termini are due to the alternative transcription start sites, while only 9% arise from 5′ UTR alternative splicing (AS). Biosurfer's detailed tracking of nucleotide-to-residue relationships helps reveal an uncommonly tracked source of single amino acid residue changes arising from the codon splits at junctions. For 17% of internal sequence changes, such split codon patterns lead to single residue differences, termed “ragged codons.” Of variable C-termini, 72% involve splice- or intron retention-induced reading frameshifts. We systematically characterize an unusual pattern of reading frame changes, in which the first frameshift is closely followed by a distinct second frameshift that restores the original frame, which we term a “snapback” frameshift. We analyze the long-read RNA-seq-predicted proteome of a human cell line and find similar trends as compared to our GENCODE analysis, with the exception of a higher proportion of transcripts predicted to undergo nonsense-mediated decay. Biosurfer's comprehensive characterization of long-read RNA-seq data sets should accelerate insights of the functional role of protein isoforms, providing mechanistic explanation of the origins of the proteomic diversity driven by the AS. Biosurfer is available as a Python package.
{"title":"Biosurfer for systematic tracking of regulatory mechanisms leading to protein isoform diversity","authors":"Mayank Murali, Jamie Saquing, Senbao Lu, Ziyang Gao, Emily F. Watts, Ben Jordan, Zachary Peters Wakefield, Ana Fiszbein, David R. Cooper, Peter J. Castaldi, Dmitry Korkin, Gloria M. Sheynkman","doi":"10.1101/gr.279317.124","DOIUrl":"https://doi.org/10.1101/gr.279317.124","url":null,"abstract":"Long-read RNA-seq has shed light on transcriptomic complexity, but questions remain about the functionality of downstream protein products. We introduce Biosurfer, a computational approach for comparing protein isoforms, while systematically tracking the transcriptional, splicing, and translational variations that underlie differences in the sequences of the protein products. Using Biosurfer, we analyzed the differences in 35,082 pairs of GENCODE annotated protein isoforms, finding a majority (70%) of variable N-termini are due to the alternative transcription start sites, while only 9% arise from 5′ UTR alternative splicing (AS). Biosurfer's detailed tracking of nucleotide-to-residue relationships helps reveal an uncommonly tracked source of single amino acid residue changes arising from the codon splits at junctions. For 17% of internal sequence changes, such split codon patterns lead to single residue differences, termed “ragged codons.” Of variable C-termini, 72% involve splice- or intron retention-induced reading frameshifts. We systematically characterize an unusual pattern of reading frame changes, in which the first frameshift is closely followed by a distinct second frameshift that restores the original frame, which we term a “snapback” frameshift. We analyze the long-read RNA-seq-predicted proteome of a human cell line and find similar trends as compared to our GENCODE analysis, with the exception of a higher proportion of transcripts predicted to undergo nonsense-mediated decay. Biosurfer's comprehensive characterization of long-read RNA-seq data sets should accelerate insights of the functional role of protein isoforms, providing mechanistic explanation of the origins of the proteomic diversity driven by the AS. Biosurfer is available as a Python package.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"32 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143627426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qian Qin, Victoria Popic, Kirsty Wienand, Houlin Yu, Emily White, Akanksha Khorgade, Asa Shin, Christophe Georgescu, Catarina D. Campbell, Arthur Dondi, Niko Beerenwinkel, Francisca Vazquez, Aziz M. Al'Khafaji, Brian J. Haas
Gene fusions are found as cancer drivers in diverse adult and pediatric cancers. Accurate detection of fusion transcripts is essential in cancer clinical diagnostics and prognostics and for guiding therapeutic development. Most currently available methods for fusion transcript detection are compatible with Illumina RNA-seq involving highly accurate short-read sequences. Recent advances in long-read isoform sequencing enable the detection of fusion transcripts at unprecedented resolution in bulk and single-cell samples. Here, we developed a new computational tool, CTAT-LR-Fusion, to detect fusion transcripts from long-read RNA-seq with or without companion short reads, with applications to bulk or single-cell transcriptomes. We demonstrate that CTAT-LR-Fusion exceeds the fusion detection accuracy of alternative methods as benchmarked with simulated and genuine long-read RNA-seq. Using short- and long-read RNA-seq, we further apply CTAT-LR-Fusion to bulk transcriptomes of nine tumor cell lines and to tumor single cells derived from a melanoma sample and three metastatic high-grade serous ovarian carcinoma samples. In both bulk and single-cell RNA-seq, long isoform reads yield higher sensitivity for fusion detection than short reads with notable exceptions. By combining short and long reads in CTAT-LR-Fusion, we are able to further maximize the detection of fusion splicing isoforms and fusion-expressing tumor cells.
{"title":"Accurate fusion transcript identification from long- and short-read isoform sequencing at bulk or single-cell resolution","authors":"Qian Qin, Victoria Popic, Kirsty Wienand, Houlin Yu, Emily White, Akanksha Khorgade, Asa Shin, Christophe Georgescu, Catarina D. Campbell, Arthur Dondi, Niko Beerenwinkel, Francisca Vazquez, Aziz M. Al'Khafaji, Brian J. Haas","doi":"10.1101/gr.279200.124","DOIUrl":"https://doi.org/10.1101/gr.279200.124","url":null,"abstract":"Gene fusions are found as cancer drivers in diverse adult and pediatric cancers. Accurate detection of fusion transcripts is essential in cancer clinical diagnostics and prognostics and for guiding therapeutic development. Most currently available methods for fusion transcript detection are compatible with Illumina RNA-seq involving highly accurate short-read sequences. Recent advances in long-read isoform sequencing enable the detection of fusion transcripts at unprecedented resolution in bulk and single-cell samples. Here, we developed a new computational tool, CTAT-LR-Fusion, to detect fusion transcripts from long-read RNA-seq with or without companion short reads, with applications to bulk or single-cell transcriptomes. We demonstrate that CTAT-LR-Fusion exceeds the fusion detection accuracy of alternative methods as benchmarked with simulated and genuine long-read RNA-seq. Using short- and long-read RNA-seq, we further apply CTAT-LR-Fusion to bulk transcriptomes of nine tumor cell lines and to tumor single cells derived from a melanoma sample and three metastatic high-grade serous ovarian carcinoma samples. In both bulk and single-cell RNA-seq, long isoform reads yield higher sensitivity for fusion detection than short reads with notable exceptions. By combining short and long reads in CTAT-LR-Fusion, we are able to further maximize the detection of fusion splicing isoforms and fusion-expressing tumor cells.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"6 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143627427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kai Otsuka, Akihiko Sakashita, So Maezawa, Richard M. Schultz, Satoshi H. Namekawa
As transposable elements (TEs) coevolved with the host genome, the host genome exploited TEs as functional regulatory elements of gene expression. Here we show that a subset of KRAB domain–containing zinc-finger proteins (KZFPs), which are highly expressed in mitotically dividing spermatogonia, repress the enhancer function of endogenous retroviruses (ERVs) and that the release from KZFP-mediated repression allows activation of ERV enhancers upon entry into meiosis. This regulatory feature is observed for independently evolved KZFPs and ERVs in mice and humans, suggesting evolutionary conservation in mammals. Further, we show that KZFP-targeted ERVs are underrepresented on the sex chromosomes in meiosis, suggesting that meiotic sex chromosome inactivation (MSCI) may antagonize the coevolution of KZFPs and ERVs in mammals. Our study uncovers a mechanism by which a subset of KZFPs regulate ERVs to sculpt germline transcriptomes. We propose that epigenetic programming during the transition from mitotic spermatogonia to meiotic spermatocytes facilitates the coevolution of KZFPs and TEs on autosomes and is antagonized by MSCI.
{"title":"KRAB zinc-finger proteins regulate endogenous retroviruses to sculpt germline transcriptomes and genome evolution","authors":"Kai Otsuka, Akihiko Sakashita, So Maezawa, Richard M. Schultz, Satoshi H. Namekawa","doi":"10.1101/gr.279924.124","DOIUrl":"https://doi.org/10.1101/gr.279924.124","url":null,"abstract":"As transposable elements (TEs) coevolved with the host genome, the host genome exploited TEs as functional regulatory elements of gene expression. Here we show that a subset of KRAB domain–containing zinc-finger proteins (KZFPs), which are highly expressed in mitotically dividing spermatogonia, repress the enhancer function of endogenous retroviruses (ERVs) and that the release from KZFP-mediated repression allows activation of ERV enhancers upon entry into meiosis. This regulatory feature is observed for independently evolved KZFPs and ERVs in mice and humans, suggesting evolutionary conservation in mammals. Further, we show that KZFP-targeted ERVs are underrepresented on the sex chromosomes in meiosis, suggesting that meiotic sex chromosome inactivation (MSCI) may antagonize the coevolution of KZFPs and ERVs in mammals. Our study uncovers a mechanism by which a subset of KZFPs regulate ERVs to sculpt germline transcriptomes. We propose that epigenetic programming during the transition from mitotic spermatogonia to meiotic spermatocytes facilitates the coevolution of KZFPs and TEs on autosomes and is antagonized by MSCI.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"208 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143608348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DNA methylation most commonly occurs as 5-methylcytosine (5mC) in the human genome and has been associated with human diseases. Recent developments in single-molecule sequencing technologies (Oxford Nanopore Technologies (ONT) and Pacific Biosciences) have enabled readouts of long, native DNA molecules, including cytosine methylation. ONT recently upgraded their Nanopore sequencing chemistry and kits from the R9 to the R10 version, which yielded increased accuracy and sequencing throughput. However the effects on methylation detection have not yet been documented. Here, we performed a series of computational analyses to characterize differences in Nanopore-based 5mC detection between the ONT R9 and R10 chemistries. We compared 5mC calls in R9 and R10 for three human genome datasets: a cell line, a frontal cortex brain sample, and a blood sample. We performed an in-depth analysis on CpG islands and homopolymer regions, and documented high concordance for methylation detection among sequencing technologies. The strongest correlation was observed between Nanopore R10 and Illumina bisulfite technologies for cell line-derived datasets. Subtle differences in methylation datasets between technologies can impact analysis tools such as differential methylation calling software. Our findings show that comparisons can be drawn between methylation data from different Nanopore chemistries using guided hypotheses. This work will facilitate comparison among Nanopore data cohorts derived using different chemistries from large scale sequencing efforts, such as the NIH CARD Long Read Initiative.
{"title":"Assessing DNA methylation detection for primary human tissue using nanopore sequencing","authors":"Rylee Genner, Stuart Akeson, Melissa Meredith, Pilar Alvarez Jerez, Laksh Malik, Breeana Baker, Abigail Miano-Burkhardt, CARD-long-read Team, Benedict Paten, Kimberley J Billingsley, Cornelis Blauwendraat, Miten Jain","doi":"10.1101/gr.279159.124","DOIUrl":"https://doi.org/10.1101/gr.279159.124","url":null,"abstract":"DNA methylation most commonly occurs as 5-methylcytosine (5mC) in the human genome and has been associated with human diseases. Recent developments in single-molecule sequencing technologies (Oxford Nanopore Technologies (ONT) and Pacific Biosciences) have enabled readouts of long, native DNA molecules, including cytosine methylation. ONT recently upgraded their Nanopore sequencing chemistry and kits from the R9 to the R10 version, which yielded increased accuracy and sequencing throughput. However the effects on methylation detection have not yet been documented. Here, we performed a series of computational analyses to characterize differences in Nanopore-based 5mC detection between the ONT R9 and R10 chemistries. We compared 5mC calls in R9 and R10 for three human genome datasets: a cell line, a frontal cortex brain sample, and a blood sample. We performed an in-depth analysis on CpG islands and homopolymer regions, and documented high concordance for methylation detection among sequencing technologies. The strongest correlation was observed between Nanopore R10 and Illumina bisulfite technologies for cell line-derived datasets. Subtle differences in methylation datasets between technologies can impact analysis tools such as differential methylation calling software. Our findings show that comparisons can be drawn between methylation data from different Nanopore chemistries using guided hypotheses. This work will facilitate comparison among Nanopore data cohorts derived using different chemistries from large scale sequencing efforts, such as the NIH CARD Long Read Initiative.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"26 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143569561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meng Wang, Yumei Li, Jun Wang, Soo Hwan Oh, Yexuan Cao, Rui Chen
The vast majority of protein-coding genes in the human genome produce multiple mRNA isoforms through alternative splicing, significantly enhancing the complexity of the transcriptome and proteome. To establish an efficient method for characterizing transcript isoforms within tissue samples, we conducted a systematic comparison between single-cell long-read and conventional short-read RNA sequencing techniques. The transcriptome of approximately 30,000 mouse retina cells was profiled using 1.54 billion Illumina short reads and 1.40 billion Oxford Nanopore Technologies long reads. Consequently, we identify 44,325 transcript isoforms, with a notable 38% previously uncharacterized and 17% expressed exclusively in distinct cellular subclasses. We observe that long-read sequencing not only matches the gene expression and cell-type annotation performance of short-read sequencing but also excel in the precise identification of transcript isoforms. While transcript isoforms are often shared across various cell types, their relative abundance shows considerable cell type–specific variation. The data generated from our study significantly enhance the existing repertoire of transcript isoforms, thereby establishing a resource for future research into the mechanisms and implications of alternative splicing within retinal biology and its links to related diseases.
{"title":"Integrating short-read and long-read single-cell RNA sequencing for comprehensive transcriptome profiling in mouse retina","authors":"Meng Wang, Yumei Li, Jun Wang, Soo Hwan Oh, Yexuan Cao, Rui Chen","doi":"10.1101/gr.279167.124","DOIUrl":"https://doi.org/10.1101/gr.279167.124","url":null,"abstract":"The vast majority of protein-coding genes in the human genome produce multiple mRNA isoforms through alternative splicing, significantly enhancing the complexity of the transcriptome and proteome. To establish an efficient method for characterizing transcript isoforms within tissue samples, we conducted a systematic comparison between single-cell long-read and conventional short-read RNA sequencing techniques. The transcriptome of approximately 30,000 mouse retina cells was profiled using 1.54 billion Illumina short reads and 1.40 billion Oxford Nanopore Technologies long reads. Consequently, we identify 44,325 transcript isoforms, with a notable 38% previously uncharacterized and 17% expressed exclusively in distinct cellular subclasses. We observe that long-read sequencing not only matches the gene expression and cell-type annotation performance of short-read sequencing but also excel in the precise identification of transcript isoforms. While transcript isoforms are often shared across various cell types, their relative abundance shows considerable cell type–specific variation. The data generated from our study significantly enhance the existing repertoire of transcript isoforms, thereby establishing a resource for future research into the mechanisms and implications of alternative splicing within retinal biology and its links to related diseases.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"489 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143569559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoyue Duan, Chaolei Chen, Chang Du, Liang Guo, Jun Liu, Naipeng Hou, Pan Li, Xiaolan Qi, Fei Gao, Xuguang Du, Jiangping Song, Sen Wu
Although CRISPR-Cas based genome editing has made significant strides over the past decade, achieving simultaneous homozygous gene editing of multiple targets in primary cells remains a significant challenge. In this study, we optimized a coselection strategy to enhance homozygous gene editing rates in the genomes of primary porcine fetal fibroblasts (PFFs). The strategy utilizes the expression of a surrogate reporter (eGFP) to select for cells with the highest reporter expression, thereby improving editing efficiency. When applied to simultaneous multigene editing, we targeted the most challenging site for selection, while other target sites did not require selection. Using this approach, we successfully obtained single-cell PFF clones (3/10) with seven or more homozygously edited genes, including GGTA1, CMAH, B4GALNT2, CD46, CD47, THBD, and GHR. Importantly, cells edited using this strategy were efficiently used for somatic cell nuclear transfer (SCNT) to generate healthy xenotransplantation pigs in less than five months, a process that previously required years of breeding or multiple rounds of SCNT.
{"title":"Homozygous editing of multiple genes for accelerated generation of xenotransplantation pigs","authors":"Xiaoyue Duan, Chaolei Chen, Chang Du, Liang Guo, Jun Liu, Naipeng Hou, Pan Li, Xiaolan Qi, Fei Gao, Xuguang Du, Jiangping Song, Sen Wu","doi":"10.1101/gr.279709.124","DOIUrl":"https://doi.org/10.1101/gr.279709.124","url":null,"abstract":"Although CRISPR-Cas based genome editing has made significant strides over the past decade, achieving simultaneous homozygous gene editing of multiple targets in primary cells remains a significant challenge. In this study, we optimized a coselection strategy to enhance homozygous gene editing rates in the genomes of primary porcine fetal fibroblasts (PFFs). The strategy utilizes the expression of a surrogate reporter (eGFP) to select for cells with the highest reporter expression, thereby improving editing efficiency. When applied to simultaneous multigene editing, we targeted the most challenging site for selection, while other target sites did not require selection. Using this approach, we successfully obtained single-cell PFF clones (3/10) with seven or more homozygously edited genes, including <em>GGTA1</em>, <em>CMAH</em>, <em>B4GALNT2</em>, <em>CD46</em>, <em>CD47</em>, <em>THBD</em>, and <em>GHR</em>. Importantly, cells edited using this strategy were efficiently used for somatic cell nuclear transfer (SCNT) to generate healthy xenotransplantation pigs in less than five months, a process that previously required years of breeding or multiple rounds of SCNT.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"2 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143560703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kang Du, Oliver Deusch, Ilja Bezrukov, Christa Lanz, Yann Guiguen, Margarete Hoffmann, Anette Habring, Detlef Weigel, Manfred Schartl, Christine Dreyer
The guppy Y Chromosome has been a paradigmatic model for studying the genetics of sex-linked traits and Y Chromosome–driven evolution for more than a century. Despite strong efforts, knowledge on genomic organization and molecular differentiation of the sex chromosome pair remains unsatisfactory and partly contradictory with respect to regions of reduced recombination. Especially the border between pseudoautosomal and male-specific regions of the Y has not been defined so far. To circumvent the problems in assigning the repeat-rich differentiated hemizygous or heterozygous sequences of the sex chromosome pair, we sequenced a YY male generated by a cross of a sex-reversed Maculatus strain XY female to a normal XY male from the inbred Guanapo population. High-molecular-weight genomic DNA from the YY male was sequenced on the Pacific Biosciences platform, and both Y haplotypes were reconstructed by Trio binning. By mapping of male specific SNPs and RADseq sequences, we identify a single male specific-region of ∼5 Mb length at the distal end of the Y (MSY). Sequence divergence between X and Y in the segment is on average five times higher than in the proximal part in agreement with reduced recombination. The MSY is enriched for repeats and transposons but does not differ in the content of coding genes from the X, indicating that genic degeneration has not progressed to a measurable degree.
{"title":"Identification of the male-specific region on the guppy Y Chromosome from a haplotype-resolved assembly","authors":"Kang Du, Oliver Deusch, Ilja Bezrukov, Christa Lanz, Yann Guiguen, Margarete Hoffmann, Anette Habring, Detlef Weigel, Manfred Schartl, Christine Dreyer","doi":"10.1101/gr.279582.124","DOIUrl":"https://doi.org/10.1101/gr.279582.124","url":null,"abstract":"The guppy Y Chromosome has been a paradigmatic model for studying the genetics of sex-linked traits and Y Chromosome–driven evolution for more than a century. Despite strong efforts, knowledge on genomic organization and molecular differentiation of the sex chromosome pair remains unsatisfactory and partly contradictory with respect to regions of reduced recombination. Especially the border between pseudoautosomal and male-specific regions of the Y has not been defined so far. To circumvent the problems in assigning the repeat-rich differentiated hemizygous or heterozygous sequences of the sex chromosome pair, we sequenced a YY male generated by a cross of a sex-reversed Maculatus strain XY female to a normal XY male from the inbred Guanapo population. High-molecular-weight genomic DNA from the YY male was sequenced on the Pacific Biosciences platform, and both Y haplotypes were reconstructed by Trio binning. By mapping of male specific SNPs and RADseq sequences, we identify a single male specific-region of ∼5 Mb length at the distal end of the Y (MSY). Sequence divergence between X and Y in the segment is on average five times higher than in the proximal part in agreement with reduced recombination. The MSY is enriched for repeats and transposons but does not differ in the content of coding genes from the X, indicating that genic degeneration has not progressed to a measurable degree.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"2 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143546119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Interactions between mitochondrial and nuclear factors are essential to life. Nevertheless, the importance of coordinated regulation of mitochondrial–nuclear gene expression (CMNGE) to changing physiological conditions is poorly understood and is limited to certain tissues and organisms. We hypothesized that CMNGE is important for development across vertebrates and, hence, should be conserved. As a first step, we analyzed more than 1400 RNA-seq experiments performed during prenatal development, in neonates, and in adults across vertebrate evolution. We find conserved sharp elevation of CMNGE after birth, including oxidative phosphorylation (OXPHOS) and mitochondrial ribosome genes, in the heart, hindbrain, forebrain, and kidney across mammals, as well as in Gallus gallus and in the lizard Anolis carolinensis. This is accompanied by elevated expression of TCA cycle enzymes and reduction in hypoxia response genes, suggesting a conserved cross-tissue metabolic switch after birth/hatching. Analysis of about 70 known regulators of mitochondrial gene expression reveals consistently elevated expression of PPARGC1A (PGC1 alpha) and CEBPB after birth/hatching across organisms and tissues, thus highlighting them as candidate regulators of CMNGE upon transition to the neonate. Analyses of Danio rerio, Xenopus tropicalis, Caenorhabditis elegans, and Drosophila melanogaster reveal elevated CMNGE prior to hatching in X. tropicalis and in D. melanogaster, which is associated with the emergence of muscle activity. Lack of such an ancient pattern in mammals and in chickens suggests that it was lost during radiation of terrestrial vertebrates. Taken together, our results suggest that regulated CMNGE after birth reflects an essential metabolic switch that is under strong selective constraints.
{"title":"Vertebrates show coordinated elevated expression of mitochondrial and nuclear genes after birth","authors":"Hadar Medini, Dan Mishmar","doi":"10.1101/gr.279700.124","DOIUrl":"https://doi.org/10.1101/gr.279700.124","url":null,"abstract":"Interactions between mitochondrial and nuclear factors are essential to life. Nevertheless, the importance of coordinated regulation of mitochondrial–nuclear gene expression (CMNGE) to changing physiological conditions is poorly understood and is limited to certain tissues and organisms. We hypothesized that CMNGE is important for development across vertebrates and, hence, should be conserved. As a first step, we analyzed more than 1400 RNA-seq experiments performed during prenatal development, in neonates, and in adults across vertebrate evolution. We find conserved sharp elevation of CMNGE after birth, including oxidative phosphorylation (OXPHOS) and mitochondrial ribosome genes, in the heart, hindbrain, forebrain, and kidney across mammals, as well as in <em>Gallus gallus</em> and in the lizard <em>Anolis carolinensis</em>. This is accompanied by elevated expression of TCA cycle enzymes and reduction in hypoxia response genes, suggesting a conserved cross-tissue metabolic switch after birth/hatching. Analysis of about 70 known regulators of mitochondrial gene expression reveals consistently elevated expression of <em>PPARGC1A</em> (PGC1 alpha) and <em>CEBPB</em> after birth/hatching across organisms and tissues, thus highlighting them as candidate regulators of CMNGE upon transition to the neonate. Analyses of <em>Danio rerio</em>, <em>Xenopus tropicalis, Caenorhabditis elegans</em>, and <em>Drosophila melanogaster</em> reveal elevated CMNGE prior to hatching in <em>X. tropicalis</em> and in <em>D. melanogaster</em>, which is associated with the emergence of muscle activity. Lack of such an ancient pattern in mammals and in chickens suggests that it was lost during radiation of terrestrial vertebrates. Taken together, our results suggest that regulated CMNGE after birth reflects an essential metabolic switch that is under strong selective constraints.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"16 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143546122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rodolphe Dombey, Daniel Buendía-Ávila, Verónica Barragán-Borrero, Laura Diezma-Navas, Arturo Ponce-Mañe, José Mario Vargas-Guerrero, Rana Elias, Arturo Marí-Ordóñez
A handful of model plants have provided insight into silencing of transposable elements (TEs) through RNA-directed DNA methylation (RdDM). Guided by 24 nt long small-interfering RNAs (siRNAs), this epigenetic regulation installs DNA methylation and histone modifications like H3K9me2, which can be subsequently maintained independently of siRNAs. However, the genome of the clonally propagating duckweed Spirodela polyrhiza (Lemnaceae) has low levels of DNA methylation, very low expression of RdDM components, and near absence of 24 nt siRNAs. Moreover, some genes encoding RdDM factors, DNA methylation maintenance, and RNA silencing mechanisms are missing from the genome. Here, we investigated the distribution of TEs and their epigenetic marks in the Spirodela genome. Although abundant degenerated TEs have largely lost DNA methylation and H3K9me2 is low, they remain marked by the heterochromatin-associated H3K9me1 and H3K27me1 modifications. In contrast, we find high levels of DNA methylation and H3K9me2 in the relatively few intact TEs, which are source of 24 nt siRNAs, like RdDM-controlled TEs in other angiosperms. The data suggest that, potentially as adaptation to vegetative propagation, RdDM extent, silencing components, and targets are different from other angiosperms, preferentially focused on potentially intact TEs. It also provides evidence for heterochromatin maintenance independently of DNA methylation in flowering plants. These discoveries highlight the diversity of silencing mechanisms that exist in plants and the importance of using disparate model species to discover these mechanisms.
{"title":"Atypical epigenetic and small RNA control of degenerated transposons and their fragments in clonally reproducing Spirodela polyrhiza","authors":"Rodolphe Dombey, Daniel Buendía-Ávila, Verónica Barragán-Borrero, Laura Diezma-Navas, Arturo Ponce-Mañe, José Mario Vargas-Guerrero, Rana Elias, Arturo Marí-Ordóñez","doi":"10.1101/gr.279532.124","DOIUrl":"https://doi.org/10.1101/gr.279532.124","url":null,"abstract":"A handful of model plants have provided insight into silencing of transposable elements (TEs) through RNA-directed DNA methylation (RdDM). Guided by 24 nt long small-interfering RNAs (siRNAs), this epigenetic regulation installs DNA methylation and histone modifications like H3K9me2, which can be subsequently maintained independently of siRNAs. However, the genome of the clonally propagating duckweed <em>Spirodela polyrhiza</em> (<em>Lemnaceae</em>) has low levels of DNA methylation, very low expression of RdDM components, and near absence of 24 nt siRNAs. Moreover, some genes encoding RdDM factors, DNA methylation maintenance, and RNA silencing mechanisms are missing from the genome. Here, we investigated the distribution of TEs and their epigenetic marks in the <em>Spirodela</em> genome. Although abundant degenerated TEs have largely lost DNA methylation and H3K9me2 is low, they remain marked by the heterochromatin-associated H3K9me1 and H3K27me1 modifications. In contrast, we find high levels of DNA methylation and H3K9me2 in the relatively few intact TEs, which are source of 24 nt siRNAs, like RdDM-controlled TEs in other angiosperms. The data suggest that, potentially as adaptation to vegetative propagation, RdDM extent, silencing components, and targets are different from other angiosperms, preferentially focused on potentially intact TEs. It also provides evidence for heterochromatin maintenance independently of DNA methylation in flowering plants. These discoveries highlight the diversity of silencing mechanisms that exist in plants and the importance of using disparate model species to discover these mechanisms.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"822 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143546118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Merel Stemerdink, Tabea Riepe, Nick Zomer, Renee Salz, Michael Kwint, Jaap Oostrik, Raoul Timmermans, Barbara Ferrari, Stefano Ferrari, Alfredo Duenas Rey, Emma Delanote, Suzanne E de Bruijn, Hannie Kremer, Susanne Roosing, Frauke Coppieters, Alexander Hoischen, Frans P.M. Cremers, Peter-Bram A.C. 't Hoen, Erwin van Wijk, Erik de Vrieze
Sequencing technologies have long limited the comprehensive investigation of large transcripts associated with inherited retinal diseases (IRDs) like Usher syndrome, which involves 11 associated genes with transcripts up to 19.6 kb. To address this, we used PacBio long-read mRNA isoform sequencing (Iso-Seq) following standard library preparation and an optimized workflow to enrich for long transcripts in the human neural retina. While our workflow achieved sequencing of transcripts up to 15 kb, this was insufficient for Usher syndrome-associated genes USH2A and ADGRV1, with transcripts of 18.9 kb and 19.6 kb, respectively. To overcome this, we employed the Samplix Xdrop System for indirect target enrichment of cDNA, a technique typically used for genomic DNA capture. This method facilitated the successful capture and sequencing of ADGRV1 transcripts as well as full-length 18.9 kb USH2A transcripts. By combining algorithmic analysis with detailed manual curation of sequenced reads, we identified novel isoforms characterized by an alternative 5' transcription start site, the inclusion of previously unannotated exons or alternative splicing events across the 11 Usher syndrome-associated genes. These findings have significant implications for genetic diagnostics and therapeutic development. The analysis applied here on Usher syndrome-associated transcripts exemplifies a valuable approach that can be extended to explore the transcriptomic complexity of other IRD-associated genes in the complete transcriptome dataset generated within this study. Additionally, we demonstrated the adaptability of the Samplix Xdrop system for capturing cDNA, and the optimized methodologies described can be expanded to facilitate the enrichment of large transcripts from various tissues of interest.
{"title":"Deciphering the largest disease-associated transcript isoforms in the human neural retina with advanced long-read sequencing approaches","authors":"Merel Stemerdink, Tabea Riepe, Nick Zomer, Renee Salz, Michael Kwint, Jaap Oostrik, Raoul Timmermans, Barbara Ferrari, Stefano Ferrari, Alfredo Duenas Rey, Emma Delanote, Suzanne E de Bruijn, Hannie Kremer, Susanne Roosing, Frauke Coppieters, Alexander Hoischen, Frans P.M. Cremers, Peter-Bram A.C. 't Hoen, Erwin van Wijk, Erik de Vrieze","doi":"10.1101/gr.280060.124","DOIUrl":"https://doi.org/10.1101/gr.280060.124","url":null,"abstract":"Sequencing technologies have long limited the comprehensive investigation of large transcripts associated with inherited retinal diseases (IRDs) like Usher syndrome, which involves 11 associated genes with transcripts up to 19.6 kb. To address this, we used PacBio long-read mRNA isoform sequencing (Iso-Seq) following standard library preparation and an optimized workflow to enrich for long transcripts in the human neural retina. While our workflow achieved sequencing of transcripts up to 15 kb, this was insufficient for Usher syndrome-associated genes <em>USH2A</em> and <em>ADGRV1</em>, with transcripts of 18.9 kb and 19.6 kb, respectively. To overcome this, we employed the Samplix Xdrop System for indirect target enrichment of cDNA, a technique typically used for genomic DNA capture. This method facilitated the successful capture and sequencing of <em>ADGRV1</em> transcripts as well as full-length 18.9 kb <em>USH2A</em> transcripts. By combining algorithmic analysis with detailed manual curation of sequenced reads, we identified novel isoforms characterized by an alternative 5' transcription start site, the inclusion of previously unannotated exons or alternative splicing events across the 11 Usher syndrome-associated genes. These findings have significant implications for genetic diagnostics and therapeutic development. The analysis applied here on Usher syndrome-associated transcripts exemplifies a valuable approach that can be extended to explore the transcriptomic complexity of other IRD-associated genes in the complete transcriptome dataset generated within this study. Additionally, we demonstrated the adaptability of the Samplix Xdrop system for capturing cDNA, and the optimized methodologies described can be expanded to facilitate the enrichment of large transcripts from various tissues of interest.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"2 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143546121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}