Siavash Raeisi Dehkordi, Zhaoyang Jia, Joey Estabrook, Jen Hauenstein, Neil Miller, Naz Güleray-Lafci, Jürgen Neesen, Alex Hastie, Alka Chaubey, Andy Wing Chun Pang, Paul Dremsek, Vineet Bafna
The whole-genome karyotype refers to the sequence of large chromosomal segments comprising an individual's genotype. Karyotype analysis, which includes identifying aneuploidies and structural rearrangements, is essential for understanding genetic risk factors, informing diagnosis and treatment, and guiding genetic counseling in constitutional disorders. The current karyotyping standard relies on microscopic chromosome examination, a complex and expertise-dependent process with megabase-scale resolution. Optical genome mapping (OGM) technology offers an efficient approach to detect large-scale genomic lesions. Here, we introduce OMKar, a computational method that generates virtual karyotypes from OGM data. OMKar integrates structural variants (SVs) and copy number (CN) variants into a breakpoint graph representation. It re-estimates CNs using integer linear programming to enforce CN balance and then identifies constrained Eulerian paths corresponding to full chromosome structures. OMKar is evaluated on 38 whole-genome simulations of constitutional disorders, achieving 88% precision and 95% recall for SV concordance and a 95% Jaccard score for CN concordance. We further apply OMKar to 154 clinical samples including 50 prenatal, 41 postnatal, and 63 parental genomes collected across 10 sites. It correctly reconstructs the karyotype in 144 cases, including 25 of 25 aneuploidies, 32 of 32 balanced translocations, and 72 of 82 unbalanced rearrangements. Identified disorders include cri-du-chat, Wolf–Hirschhorn, Prader–Willi, Down, and Turner syndromes. Notably, OMKar uncovers plausible genetic mechanisms in five previously unexplained cases. These results demonstrate the accuracy and utility of OMKar for OGM-based constitutional karyotyping.
{"title":"OMKar automates genome karyotyping using optical maps to identify constitutional abnormalities","authors":"Siavash Raeisi Dehkordi, Zhaoyang Jia, Joey Estabrook, Jen Hauenstein, Neil Miller, Naz Güleray-Lafci, Jürgen Neesen, Alex Hastie, Alka Chaubey, Andy Wing Chun Pang, Paul Dremsek, Vineet Bafna","doi":"10.1101/gr.280536.125","DOIUrl":"https://doi.org/10.1101/gr.280536.125","url":null,"abstract":"The whole-genome karyotype refers to the sequence of large chromosomal segments comprising an individual's genotype. Karyotype analysis, which includes identifying aneuploidies and structural rearrangements, is essential for understanding genetic risk factors, informing diagnosis and treatment, and guiding genetic counseling in constitutional disorders. The current karyotyping standard relies on microscopic chromosome examination, a complex and expertise-dependent process with megabase-scale resolution. Optical genome mapping (OGM) technology offers an efficient approach to detect large-scale genomic lesions. Here, we introduce OMKar, a computational method that generates virtual karyotypes from OGM data. OMKar integrates structural variants (SVs) and copy number (CN) variants into a breakpoint graph representation. It re-estimates CNs using integer linear programming to enforce CN balance and then identifies constrained Eulerian paths corresponding to full chromosome structures. OMKar is evaluated on 38 whole-genome simulations of constitutional disorders, achieving 88% precision and 95% recall for SV concordance and a 95% Jaccard score for CN concordance. We further apply OMKar to 154 clinical samples including 50 prenatal, 41 postnatal, and 63 parental genomes collected across 10 sites. It correctly reconstructs the karyotype in 144 cases, including 25 of 25 aneuploidies, 32 of 32 balanced translocations, and 72 of 82 unbalanced rearrangements. Identified disorders include cri-du-chat, Wolf–Hirschhorn, Prader–Willi, Down, and Turner syndromes. Notably, OMKar uncovers plausible genetic mechanisms in five previously unexplained cases. These results demonstrate the accuracy and utility of OMKar for OGM-based constitutional karyotyping.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"11 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145515801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Enzo Battistella, Anant Maheshwari, Barış Ekim, Bonnie Berger, Victoria Popic
Haplotype assembly is the problem of reconstructing the combination of alleles on the maternally and paternally inherited chromosome copies. Individual haplotypes are essential to our understanding of how combinations of different variants impact phenotype. In this work, we focus on read-based haplotype assembly of individual diploid genomes, which reconstructs the two haplotypes directly from read alignments at variant loci. We introduce Ralphi, a novel deep reinforcement learning framework for haplotype assembly, which integrates the representational power of deep learning with reinforcement learning to accurately partition read fragments into their respective haplotype sets. To set the reward objective for reinforcement learning, our approach uses the classic reduction of the problem to the maximum fragment cut formulation on fragment graphs, in which nodes correspond to reads and edge weights capture the conflict or agreement of the reads at shared variant sites. We train Ralphi on a diverse data set of fragment graph topologies derived from genomes in the 1000 Genomes Project. We show that Ralphi achieves lower error rates at comparable or longer haplotype block lengths over the state of the art for short and long reads at varying coverage in standard human genome benchmarks.
{"title":"Graph-based deep reinforcement learning for haplotype assembly with Ralphi","authors":"Enzo Battistella, Anant Maheshwari, Barış Ekim, Bonnie Berger, Victoria Popic","doi":"10.1101/gr.280569.125","DOIUrl":"https://doi.org/10.1101/gr.280569.125","url":null,"abstract":"Haplotype assembly is the problem of reconstructing the combination of alleles on the maternally and paternally inherited chromosome copies. Individual haplotypes are essential to our understanding of how combinations of different variants impact phenotype. In this work, we focus on read-based haplotype assembly of individual diploid genomes, which reconstructs the two haplotypes directly from read alignments at variant loci. We introduce Ralphi, a novel deep reinforcement learning framework for haplotype assembly, which integrates the representational power of deep learning with reinforcement learning to accurately partition read fragments into their respective haplotype sets. To set the reward objective for reinforcement learning, our approach uses the classic reduction of the problem to the <em>maximum fragment cut</em> formulation on fragment graphs, in which nodes correspond to reads and edge weights capture the conflict or agreement of the reads at shared variant sites. We train Ralphi on a diverse data set of fragment graph topologies derived from genomes in the 1000 Genomes Project. We show that Ralphi achieves lower error rates at comparable or longer haplotype block lengths over the state of the art for short and long reads at varying coverage in standard human genome benchmarks.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"87 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145515657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Spatial transcriptomics (ST) has transformed our understanding of tissue architecture and cellular interactions, but integrating ST data across platforms remains challenging due to differences in gene panels, data sparsity, and technical variability. Here, we introduce LLOKI, a novel framework for integrating imaging-based ST data from diverse platforms without requiring shared gene panels. LLOKI addresses ST integration through two key alignment tasks: feature alignment across technologies and batch alignment across data sets. Optimal transport-guided feature propagation adjusts data sparsity to match scRNA-seq references through graph-based imputation, enabling single-cell foundation models such as scGPT to generate unified features. Batch alignment then refines scGPT-transformed embeddings, mitigating batch effects while preserving biological variability. Evaluations on mouse brain samples from five different technologies demonstrate that LLOKI outperforms existing methods and is effective for cross-technology spatial gene program identification, and tissue slice alignment. Applying LLOKI to five ovarian cancer data sets, we identify an integrated gene program indicative of tumor-infiltrating T cells across gene panels. Together, LLOKI provides a robust foundation for cross-platform ST studies, with the potential to scale to large atlas data sets, enabling deeper insights into cellular organization and tissue environments.
{"title":"Unified integration of spatial transcriptomics across platforms with LLOKI","authors":"Ellie Haber, Ajinkya Deshpande, Jian Ma, Spencer Krieger","doi":"10.1101/gr.280803.125","DOIUrl":"https://doi.org/10.1101/gr.280803.125","url":null,"abstract":"Spatial transcriptomics (ST) has transformed our understanding of tissue architecture and cellular interactions, but integrating ST data across platforms remains challenging due to differences in gene panels, data sparsity, and technical variability. Here, we introduce LLOKI, a novel framework for integrating imaging-based ST data from diverse platforms without requiring shared gene panels. LLOKI addresses ST integration through two key alignment tasks: feature alignment across technologies and batch alignment across data sets. Optimal transport-guided feature propagation adjusts data sparsity to match scRNA-seq references through graph-based imputation, enabling single-cell foundation models such as scGPT to generate unified features. Batch alignment then refines scGPT-transformed embeddings, mitigating batch effects while preserving biological variability. Evaluations on mouse brain samples from five different technologies demonstrate that LLOKI outperforms existing methods and is effective for cross-technology spatial gene program identification, and tissue slice alignment. Applying LLOKI to five ovarian cancer data sets, we identify an integrated gene program indicative of tumor-infiltrating T cells across gene panels. Together, LLOKI provides a robust foundation for cross-platform ST studies, with the potential to scale to large atlas data sets, enabling deeper insights into cellular organization and tissue environments.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"55 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145509231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Laura Blanco-Berdugo, Alexis Garretson, Beth L Dumont
Approximately 2.6% of live births in the United States are conceived using assisted reproductive technologies (ART). While some ART, including in vitro fertilization (IVF) and intracytoplasmic sperm injection, are known to alter the epigenetic landscape of early embryonic development, their impact on DNA sequence stability is unclear. Here, we leverage the strengths of the laboratory mouse model system to investigate whether a standard ART series (ovarian hyperstimulation, gamete isolation, IVF, embryo culture, and embryo transfer) affects genome stability. Age-matched cohorts of 12 ART-derived and 16 naturally conceived C57BL/6J inbred mice were reared in a controlled setting and whole-genome sequenced to ~50× coverage. Using a rigorous pipeline for de novo single nucleotide variant (dnSNV) discovery, we observe a ~30% (95% CI: 4.5% - 56%) increase in the dnSNV rate in ART compared to naturally conceived mice (P = 0.017). Analysis of the dnSNV mutation spectrum identified signatures attributable to germline DNA repair activity but revealed no differentially enriched signatures between cohorts. We observe no enrichment of dnSNVs in specific genomic contexts, suggesting that the observed rate increase in ART-derived mice is a general genome-wide phenomenon. Together, our findings show that ART is moderately mutagenic in house mice and motivate future work to define the procedure(s) associated with this increased mutational vulnerability. While we caution that our findings cannot be immediately translated to humans, they nonetheless emphasize a pressing need for investigations on the potential mutagenicity of ART in our species.
{"title":"Modest increase in the de novo single nucleotide mutation rate in house mice born by assisted reproduction","authors":"Laura Blanco-Berdugo, Alexis Garretson, Beth L Dumont","doi":"10.1101/gr.281180.125","DOIUrl":"https://doi.org/10.1101/gr.281180.125","url":null,"abstract":"Approximately 2.6% of live births in the United States are conceived using assisted reproductive technologies (ART). While some ART, including in vitro fertilization (IVF) and intracytoplasmic sperm injection, are known to alter the epigenetic landscape of early embryonic development, their impact on DNA sequence stability is unclear. Here, we leverage the strengths of the laboratory mouse model system to investigate whether a standard ART series (ovarian hyperstimulation, gamete isolation, IVF, embryo culture, and embryo transfer) affects genome stability. Age-matched cohorts of 12 ART-derived and 16 naturally conceived C57BL/6J inbred mice were reared in a controlled setting and whole-genome sequenced to ~50× coverage. Using a rigorous pipeline for de novo single nucleotide variant (dnSNV) discovery, we observe a ~30% (95% CI: 4.5% - 56%) increase in the dnSNV rate in ART compared to naturally conceived mice (<em>P</em> = 0.017). Analysis of the dnSNV mutation spectrum identified signatures attributable to germline DNA repair activity but revealed no differentially enriched signatures between cohorts. We observe no enrichment of dnSNVs in specific genomic contexts, suggesting that the observed rate increase in ART-derived mice is a general genome-wide phenomenon. Together, our findings show that ART is moderately mutagenic in house mice and motivate future work to define the procedure(s) associated with this increased mutational vulnerability. While we caution that our findings cannot be immediately translated to humans, they nonetheless emphasize a pressing need for investigations on the potential mutagenicity of ART in our species.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"115 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145509232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An Wang, Stephanie Hicks, Donald Geman, Laurent Younes
The selection of marker gene panels is critical for capturing the cellular and spatial heterogeneity in the expanding atlases of single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics data. Most current approaches to marker gene selection operate in a label-based framework, which is inherently limited by its dependency on predefined cell type labels or clustering results. In contrast, existing label-free methods often struggle to identify genes that characterize rare cell types or subtle spatial patterns, and they frequently fail to scale efficiently with large data sets. Here, we introduce geneCover, a label-free combinatorial method that selects an optimal panel of minimally redundant marker genes based on gene-gene correlations. Our method demonstrates excellent scalability to large data sets and identifies marker gene panels that capture distinct correlation structures across the transcriptome. This allows geneCover to distinguish cell states in various tissues of living organisms effectively, including those associated with rare or otherwise difficult-to-identify cell types. We evaluate the performance of geneCover across various scRNA-seq and spatial transcriptomics data sets, comparing it to other label-free algorithms to highlight its utility and potential in diverse biological contexts.
{"title":"Label-free selection of marker genes in single-cell and spatial transcriptomics with geneCover","authors":"An Wang, Stephanie Hicks, Donald Geman, Laurent Younes","doi":"10.1101/gr.280539.125","DOIUrl":"https://doi.org/10.1101/gr.280539.125","url":null,"abstract":"The selection of marker gene panels is critical for capturing the cellular and spatial heterogeneity in the expanding atlases of single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics data. Most current approaches to marker gene selection operate in a label-based framework, which is inherently limited by its dependency on predefined cell type labels or clustering results. In contrast, existing label-free methods often struggle to identify genes that characterize rare cell types or subtle spatial patterns, and they frequently fail to scale efficiently with large data sets. Here, we introduce geneCover, a label-free combinatorial method that selects an optimal panel of minimally redundant marker genes based on gene-gene correlations. Our method demonstrates excellent scalability to large data sets and identifies marker gene panels that capture distinct correlation structures across the transcriptome. This allows geneCover to distinguish cell states in various tissues of living organisms effectively, including those associated with rare or otherwise difficult-to-identify cell types. We evaluate the performance of geneCover across various scRNA-seq and spatial transcriptomics data sets, comparing it to other label-free algorithms to highlight its utility and potential in diverse biological contexts.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"171 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145492621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Natnatee Dokmai, Kaiyuan Zhu, S. Cenk Sahinalp, Hyunghoon Cho
Genotype imputation servers enable researchers with limited resources to extract valuable insights from their data with enhanced accuracy and ease. However, the utility of these services is limited for those with sensitive study cohorts or those in restrictive regulatory environments owing to data privacy concerns. Although privacy-preserving analysis tools have been developed to broaden access to these servers, none of the existing methods support haplotype phasing, a critical component of the imputation workflow. The complexity of phasing algorithms poses a significant challenge in maintaining practical performance under privacy constraints. Here, we introduce TX-Phase, a secure haplotype phasing method based on the framework of trusted execution environments (TEEs). TX-Phase allows users’ private genomic data to be phased while ensuring data confidentiality and integrity of the computation. We introduce novel data-oblivious algorithmic techniques based on compressed reference panels and dynamic fixed-point arithmetic that comprehensively mitigate side-channel leakages in TEEs to provide robust protection of users’ genomic data throughout the analysis. Our experiments on a range of data sets from the UK Biobank and Haplotype Reference Consortium demonstrate the state-of-the-art phasing accuracy and practical runtimes of TX-Phase. Our work enables secure phasing of private genomes, opening access to large reference genomic data sets for a broader scientific community.
{"title":"Secure phasing of private genomes in a trusted execution environment with TX-Phase","authors":"Natnatee Dokmai, Kaiyuan Zhu, S. Cenk Sahinalp, Hyunghoon Cho","doi":"10.1101/gr.280558.125","DOIUrl":"https://doi.org/10.1101/gr.280558.125","url":null,"abstract":"Genotype imputation servers enable researchers with limited resources to extract valuable insights from their data with enhanced accuracy and ease. However, the utility of these services is limited for those with sensitive study cohorts or those in restrictive regulatory environments owing to data privacy concerns. Although privacy-preserving analysis tools have been developed to broaden access to these servers, none of the existing methods support haplotype phasing, a critical component of the imputation workflow. The complexity of phasing algorithms poses a significant challenge in maintaining practical performance under privacy constraints. Here, we introduce TX-Phase, a secure haplotype phasing method based on the framework of trusted execution environments (TEEs). TX-Phase allows users’ private genomic data to be phased while ensuring data confidentiality and integrity of the computation. We introduce novel data-oblivious algorithmic techniques based on compressed reference panels and dynamic fixed-point arithmetic that comprehensively mitigate side-channel leakages in TEEs to provide robust protection of users’ genomic data throughout the analysis. Our experiments on a range of data sets from the UK Biobank and Haplotype Reference Consortium demonstrate the state-of-the-art phasing accuracy and practical runtimes of TX-Phase. Our work enables secure phasing of private genomes, opening access to large reference genomic data sets for a broader scientific community.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"368 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145492620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fan Feng, Sean Moran, Anders Hansen, Xiaotian Zhang, Jie Liu
Chromatin stripes are architectural chromatin features where a singular loop anchor interacts with a contiguous region of DNA so, at the bulk sequencing level, it appears as a long stripe on chromatin contact matrices. Stripes are thought to play an important role in gene regulation and have been implicated in regulating a cell's lineage determination. Therefore, integrated analysis of stripes with genomic and epigenomic features at a genome-wide scale shows vast potential in understanding the cooperation between regulatory elements in 3D nucleome. To this end, we present Quagga, a computational tool for detection and statistical verification of genomic architectural stripes from Hi-C or Micro-C chromatin contact maps, which relies on robust image processing techniques and rigorous statistical tests for enrichment. Quagga outperforms other stripe detection methods in accuracy and is highly versatile, working with Hi-C, Micro-C, and other chromatin conformation capture data. By reporting on all tools' performance in classifying CTCF-cohesin anchored stripes, enhancer-promoter interacting stripes, and indeterminate stripes, we also demonstrate a thorough, integrated analysis to determine the output stripes' quality. Our work provides a flexible and convenient tool to help scientists explore the relationships between chromatin architectural stripes and important biological questions.
{"title":"Statistically rigorous and computationally efficient chromatin stripe detection with Quagga","authors":"Fan Feng, Sean Moran, Anders Hansen, Xiaotian Zhang, Jie Liu","doi":"10.1101/gr.280132.124","DOIUrl":"https://doi.org/10.1101/gr.280132.124","url":null,"abstract":"Chromatin stripes are architectural chromatin features where a singular loop anchor interacts with a contiguous region of DNA so, at the bulk sequencing level, it appears as a long stripe on chromatin contact matrices. Stripes are thought to play an important role in gene regulation and have been implicated in regulating a cell's lineage determination. Therefore, integrated analysis of stripes with genomic and epigenomic features at a genome-wide scale shows vast potential in understanding the cooperation between regulatory elements in 3D nucleome. To this end, we present Quagga, a computational tool for detection and statistical verification of genomic architectural stripes from Hi-C or Micro-C chromatin contact maps, which relies on robust image processing techniques and rigorous statistical tests for enrichment. Quagga outperforms other stripe detection methods in accuracy and is highly versatile, working with Hi-C, Micro-C, and other chromatin conformation capture data. By reporting on all tools' performance in classifying CTCF-cohesin anchored stripes, enhancer-promoter interacting stripes, and indeterminate stripes, we also demonstrate a thorough, integrated analysis to determine the output stripes' quality. Our work provides a flexible and convenient tool to help scientists explore the relationships between chromatin architectural stripes and important biological questions.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"19 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145492622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pangenome collections are growing to hundreds of high-quality genomes. This necessitates scalable methods for constructing pangenome alignments that can incorporate newly-sequenced assemblies. We previously developed Mumemto, which computes maximal unique matches (multi-MUMs) across pangenomes using compressed indexing. In this work, we introduce MumemtoM (Mumemto Merge), comprising two new partitioning and merging strategies. Both strategies enable highly parallel, memory efficient, and updateable computation of multi-MUMs. One of the strategies, called string-based merging, is also capable of conducting the merges in a way that follows the shape of a phylogenetic tree, naturally yielding the multi-MUM for the tree's internal nodes as well as the root. With these strategies, Mumemto now scales to 474 human haplotypes, the only multi-MUM method able to do so. It also introduces a time-memory tradeoff that allows Mumemto to be tailored to more scenarios, including in resource-limited settings.
{"title":"Partitioned Multi-MUM finding for scalable pangenomics with MumemtoM","authors":"Vikram S Shivakumar, Ben Langmead","doi":"10.1101/gr.280940.125","DOIUrl":"https://doi.org/10.1101/gr.280940.125","url":null,"abstract":"Pangenome collections are growing to hundreds of high-quality genomes. This necessitates scalable methods for constructing pangenome alignments that can incorporate newly-sequenced assemblies. We previously developed Mumemto, which computes maximal unique matches (multi-MUMs) across pangenomes using compressed indexing. In this work, we introduce MumemtoM (Mumemto Merge), comprising two new partitioning and merging strategies. Both strategies enable highly parallel, memory efficient, and updateable computation of multi-MUMs. One of the strategies, called string-based merging, is also capable of conducting the merges in a way that follows the shape of a phylogenetic tree, naturally yielding the multi-MUM for the tree's internal nodes as well as the root. With these strategies, Mumemto now scales to 474 human haplotypes, the only multi-MUM method able to do so. It also introduces a time-memory tradeoff that allows Mumemto to be tailored to more scenarios, including in resource-limited settings.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"27 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145454587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oscar Aramburu, Belen G Pardo, Ada Jimenez-Gonzalez, Andrés Blanco-Hortas, Daniel Macqueen, Carmen Bouza, Paulino Martinez
Embryogenesis is the foundational step of ontogeny, where a complex organism emerges from a single totipotent cell. This process is orchestrated by changes in transcriptional regulation, influenced by chromatin accessibility and epigenetic modifications, enabling transcription factor accessibility. Epigenomic regulation of embryogenesis has been studied in model fish, but little attention has been paid to farmed fish - where relevant traits to aquaculture rely on early developmental processes. This study reports a regulatory atlas of turbot (Scophthalmus maximus) embryogenesis. 14,560 active genes were identified in the embryonic transcriptome with > 90% showing differential expression across consecutive stages. By integrating multi-histone ChIP-seq with ATAC-seq, we built a genome-wide chromatin state model, defining promoter and enhancer activity across stages. Diverse transcription factor binding motifs were detected within regulatory elements showing differential accessibility at distinct developmental stages. Strong shifts in chromatin accessibility across stages, notably during the transition from shield to early segmentation, suggest profound chromatin reorganization underpinning somitogenesis and early organogenesis. Regardless, most changes in chromatin accessibility did not affect promoters of differentially expressed genes, suggesting that their accessibility precedes gene transcription changes. Comparative analyses with zebrafish revealed a global transcriptomic correlation of single-copy orthologs at matched stages. While conserved expression dynamics were revealed for many orthologous Hox genes, notable cross-species differences were identified from pre-ZGA leading up to hatching. This multi-omics investigation provides a novel atlas of noncoding regulatory elements controlling turbot development, with key applications for flatfish biology and sustainable aquaculture.
{"title":"Epigenomics of embryogenesis in turbot","authors":"Oscar Aramburu, Belen G Pardo, Ada Jimenez-Gonzalez, Andrés Blanco-Hortas, Daniel Macqueen, Carmen Bouza, Paulino Martinez","doi":"10.1101/gr.280355.124","DOIUrl":"https://doi.org/10.1101/gr.280355.124","url":null,"abstract":"Embryogenesis is the foundational step of ontogeny, where a complex organism emerges from a single totipotent cell. This process is orchestrated by changes in transcriptional regulation, influenced by chromatin accessibility and epigenetic modifications, enabling transcription factor accessibility. Epigenomic regulation of embryogenesis has been studied in model fish, but little attention has been paid to farmed fish - where relevant traits to aquaculture rely on early developmental processes. This study reports a regulatory atlas of turbot (<em>Scophthalmus maximus</em>) embryogenesis. 14,560 active genes were identified in the embryonic transcriptome with > 90% showing differential expression across consecutive stages. By integrating multi-histone ChIP-seq with ATAC-seq, we built a genome-wide chromatin state model, defining promoter and enhancer activity across stages. Diverse transcription factor binding motifs were detected within regulatory elements showing differential accessibility at distinct developmental stages. Strong shifts in chromatin accessibility across stages, notably during the transition from shield to early segmentation, suggest profound chromatin reorganization underpinning somitogenesis and early organogenesis. Regardless, most changes in chromatin accessibility did not affect promoters of differentially expressed genes, suggesting that their accessibility precedes gene transcription changes. Comparative analyses with zebrafish revealed a global transcriptomic correlation of single-copy orthologs at matched stages. While conserved expression dynamics were revealed for many orthologous <em>Hox</em> genes, notable cross-species differences were identified from pre-ZGA leading up to hatching. This multi-omics investigation provides a novel atlas of noncoding regulatory elements controlling turbot development, with key applications for flatfish biology and sustainable aquaculture.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"38 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145441309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
William R. Thomas, Cecilia Baldoni, Yuanyuan Zeng, David Carlson, Julie Holm-Jacobsen, Marion Muturi, Dominik von Elverfeldt, Tue B. Bennike, Dina Dechmann, John Nieland, Angelique Corthals, Liliana M Davalos
To meet the challenge of wintering in place, many high-latitude small mammals reduce energy demands through hibernation. In contrast, short-lived Eurasian common shrews, Sorex araneus, remain active and shrink, including energy-intensive organs in winter, regrowing in spring in an evolved strategy called Dehnel's phenomenon. How this size change is linked to metabolic and regulatory changes to sustain their high metabolism is unknown. We analyzed metabolic, proteomic, and gene expression profiles spanning the entirety of Dehnel's seasonal cycle in wild shrews. We show regulatory changes to oxidative phosphorylation and increased fatty acid metabolism during autumn-to-winter shrinkage, as previously found in hibernating species. But in shrews we also found upregulated winter expression of genes involved in gluconeogenesis: the biosynthesis of glucose from noncarbohydrate substrates. Coexpression models revealed changes in size and metabolic gene expression interconnect via FOXO signaling, whose overexpression reduces size and extends lifespan in many model organisms. We propose that while shifts in gluconeogenesis meet the challenge posed by high metabolic rate and active winter lifestyle, FOXO signaling is central to Dehnel's phenomenon, with spring downregulation limiting lifespan in these shrews.
{"title":"Dynamic metabolic and molecular changes during seasonal shrinking in Sorex araneus","authors":"William R. Thomas, Cecilia Baldoni, Yuanyuan Zeng, David Carlson, Julie Holm-Jacobsen, Marion Muturi, Dominik von Elverfeldt, Tue B. Bennike, Dina Dechmann, John Nieland, Angelique Corthals, Liliana M Davalos","doi":"10.1101/gr.280639.125","DOIUrl":"https://doi.org/10.1101/gr.280639.125","url":null,"abstract":"To meet the challenge of wintering in place, many high-latitude small mammals reduce energy demands through hibernation. In contrast, short-lived Eurasian common shrews, <em>Sorex araneus</em>, remain active and shrink, including energy-intensive organs in winter, regrowing in spring in an evolved strategy called Dehnel's phenomenon. How this size change is linked to metabolic and regulatory changes to sustain their high metabolism is unknown. We analyzed metabolic, proteomic, and gene expression profiles spanning the entirety of Dehnel's seasonal cycle in wild shrews. We show regulatory changes to oxidative phosphorylation and increased fatty acid metabolism during autumn-to-winter shrinkage, as previously found in hibernating species. But in shrews we also found upregulated winter expression of genes involved in gluconeogenesis: the biosynthesis of glucose from noncarbohydrate substrates. Coexpression models revealed changes in size and metabolic gene expression interconnect via FOXO signaling, whose overexpression reduces size and extends lifespan in many model organisms. We propose that while shifts in gluconeogenesis meet the challenge posed by high metabolic rate and active winter lifestyle, FOXO signaling is central to Dehnel's phenomenon, with spring downregulation limiting lifespan in these shrews.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"21 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145441275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}