Gene expression is shaped by transcriptional regulatory networks (TRNs), where transcription regulators interact within regulatory elements in a context-specific manner. Deciphering context-specific TRNs has long been constrained by the severe sparsity of cell-type-specific chromatin immunoprecipitation sequencing (ChIP-seq) profiles. Here, we present ChromBERT, a foundation model pre-trained on large-scale human ChIP-seq datasets covering ∼1,000 transcription regulators. ChromBERT learns the genome-wide syntax of regulatory cooperation and generates interpretable TRN representations. After prompt-enhanced fine-tuning, it outperforms existing methods for imputing unseen cistromes. Moreover, lightweight fine-tuning on cell-type-specific downstream tasks adapts the TRN representations to capture regulatory effects and dynamics within any given cellular context. The resulting context-specific representations can then be interpreted to infer regulatory roles of transcription regulators underlying these cell-type-specific regulatory outcomes without requiring additional ChIP-seq experiments. By overcoming the limitations of sparse transcription regulator data, ChromBERT significantly enhances our ability to model and interpret transcriptional regulation across a wide range of biological contexts.
{"title":"ChromBERT: A foundation model for learning interpretable representations for context-specific transcriptional regulatory networks.","authors":"Zhaowei Yu, Dongxu Yang, Qianqian Chen, Yuxuan Zhang, Zhanhao Li, Yucheng Wang, Chenfei Wang, Yong Zhang","doi":"10.1016/j.xgen.2025.101130","DOIUrl":"https://doi.org/10.1016/j.xgen.2025.101130","url":null,"abstract":"<p><p>Gene expression is shaped by transcriptional regulatory networks (TRNs), where transcription regulators interact within regulatory elements in a context-specific manner. Deciphering context-specific TRNs has long been constrained by the severe sparsity of cell-type-specific chromatin immunoprecipitation sequencing (ChIP-seq) profiles. Here, we present ChromBERT, a foundation model pre-trained on large-scale human ChIP-seq datasets covering ∼1,000 transcription regulators. ChromBERT learns the genome-wide syntax of regulatory cooperation and generates interpretable TRN representations. After prompt-enhanced fine-tuning, it outperforms existing methods for imputing unseen cistromes. Moreover, lightweight fine-tuning on cell-type-specific downstream tasks adapts the TRN representations to capture regulatory effects and dynamics within any given cellular context. The resulting context-specific representations can then be interpreted to infer regulatory roles of transcription regulators underlying these cell-type-specific regulatory outcomes without requiring additional ChIP-seq experiments. By overcoming the limitations of sparse transcription regulator data, ChromBERT significantly enhances our ability to model and interpret transcriptional regulation across a wide range of biological contexts.</p>","PeriodicalId":72539,"journal":{"name":"Cell genomics","volume":" ","pages":"101130"},"PeriodicalIF":11.1,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146069142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-26DOI: 10.1016/j.xgen.2025.101138
Gaspard Kerner, Nolan Kamitaki, Benjamin Strober, Alkes L Price
Genome-wide association studies have identified thousands of disease-associated loci, yet their biological interpretation remains limited. We propose joint pleiotropic and epigenomic partitioning (J-PEP), a clustering framework that integrates pleiotropic SNP effects on auxiliary traits and tissue-specific epigenomic data to partition disease-associated loci into biologically distinct clusters. We introduce a metric-pleiotropic and epigenomic prediction accuracy (PEPA)-that evaluates how well the clusters predict SNP-to-trait and SNP-to-tissue associations in off-chromosome data. Analyzing summary statistics for 165 diseases/traits (average N = 290,000), J-PEP attained 16%-30% higher PEPA than pleiotropic or epigenomic partitioning approaches, with larger improvements for well-powered traits, consistent with simulations; these gains arise from J-PEP's tendency to upweight signals present in both auxiliary trait and tissue data, emphasizing shared components. Notably, integrating single-cell chromatin accessibility data refined bulk-based clusters, enhancing cell-type resolution and specificity. For type 2 diabetes, hypertension, and other diseases/traits, J-PEP clusters recapitulated known pathways while revealing underexplored biological processes.
{"title":"Mapping disease loci to biological processes via joint pleiotropic and epigenomic partitioning.","authors":"Gaspard Kerner, Nolan Kamitaki, Benjamin Strober, Alkes L Price","doi":"10.1016/j.xgen.2025.101138","DOIUrl":"10.1016/j.xgen.2025.101138","url":null,"abstract":"<p><p>Genome-wide association studies have identified thousands of disease-associated loci, yet their biological interpretation remains limited. We propose joint pleiotropic and epigenomic partitioning (J-PEP), a clustering framework that integrates pleiotropic SNP effects on auxiliary traits and tissue-specific epigenomic data to partition disease-associated loci into biologically distinct clusters. We introduce a metric-pleiotropic and epigenomic prediction accuracy (PEPA)-that evaluates how well the clusters predict SNP-to-trait and SNP-to-tissue associations in off-chromosome data. Analyzing summary statistics for 165 diseases/traits (average N = 290,000), J-PEP attained 16%-30% higher PEPA than pleiotropic or epigenomic partitioning approaches, with larger improvements for well-powered traits, consistent with simulations; these gains arise from J-PEP's tendency to upweight signals present in both auxiliary trait and tissue data, emphasizing shared components. Notably, integrating single-cell chromatin accessibility data refined bulk-based clusters, enhancing cell-type resolution and specificity. For type 2 diabetes, hypertension, and other diseases/traits, J-PEP clusters recapitulated known pathways while revealing underexplored biological processes.</p>","PeriodicalId":72539,"journal":{"name":"Cell genomics","volume":" ","pages":"101138"},"PeriodicalIF":11.1,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146069170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Histology images offer a cost-effective approach to predicting cellular phenotypes using spatial transcriptomics. However, existing methods struggle with individual gene expression accuracy and lack the capability to predict fine-grained transcriptional cell types. We present Hist2Cell, a vision graph-transformer framework to accurately resolve fine-grained cell types directly from histology images. Trained on human lung and breast cancer datasets, Hist2Cell predicts cell-type abundance with high accuracy (Pearson correlation over 0.80) and captures cellular colocalization. Moreover, it generalizes to large-scale The Cancer Genome Atlas (TCGA) cohorts without re-training, facilitating survival prediction by revealing distinct tissue microenvironments and cell type-patient mortality relationships. Thus, Hist2Cell enables cost-efficient analysis for large-scale spatial biology studies and precise cancer prognosis.
{"title":"Hist2Cell: Deciphering fine-grained cellular architectures from histology images.","authors":"Weiqin Zhao, Zhuo Liang, Xianjie Huang, Yuanhua Huang, Lequan Yu","doi":"10.1016/j.xgen.2025.101137","DOIUrl":"10.1016/j.xgen.2025.101137","url":null,"abstract":"<p><p>Histology images offer a cost-effective approach to predicting cellular phenotypes using spatial transcriptomics. However, existing methods struggle with individual gene expression accuracy and lack the capability to predict fine-grained transcriptional cell types. We present Hist2Cell, a vision graph-transformer framework to accurately resolve fine-grained cell types directly from histology images. Trained on human lung and breast cancer datasets, Hist2Cell predicts cell-type abundance with high accuracy (Pearson correlation over 0.80) and captures cellular colocalization. Moreover, it generalizes to large-scale The Cancer Genome Atlas (TCGA) cohorts without re-training, facilitating survival prediction by revealing distinct tissue microenvironments and cell type-patient mortality relationships. Thus, Hist2Cell enables cost-efficient analysis for large-scale spatial biology studies and precise cancer prognosis.</p>","PeriodicalId":72539,"journal":{"name":"Cell genomics","volume":" ","pages":"101137"},"PeriodicalIF":11.1,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146069192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.xgen.2025.101066
Alok Jaiswal, Tristan Kooistra, Vladislav Pokatayev, Hélder N Bastos, Rita F Santos, Tresa R Sarraf, Åsa Segerstolpe, Crystal Lin, Liat Amir-Zilberstein, Shaina Twardus, Kevin Shannon, Shane P Murphy, Rachel Knipe, Ingo K Ganzleben, Katharine E Black, Toni M Delorey, Daniel B Graham, Yin P Hung, Lida P Hariri, Jacques Deguine, Agostinho Carvalho, Benjamin D Medoff, Ramnik J Xavier
Interstitial lung diseases (ILD) are characterized by fibrotic scarring of the lung parenchyma with remarkably unfavorable prognosis. Using single-nucleus RNA sequencing and spatial transcriptomics, we generated a comprehensive cellular network of the distal lung and its alterations in fibrosis. Integration with histopathology revealed that the transformation of normal parenchyma into fibrotic tissue is accompanied by ectopic bronchiolization and decellularization. Areas of active fibrosis were characterized by co-localization of pro-fibrotic CTHRC1-hi fibroblasts and aberrant transitional epithelial cells. We modeled this maladaptive differentiation of alveolar epithelial cells using organoids, demonstrating that all three pro-inflammatory ligands present in this pathogenic niche, TGF-β, IL-1β, and TNF-α, are jointly required for their induction. Additionally, we identified a requirement for the transcription factor NFATC4 during myofibroblast differentiation driven by soluble factors or mechanosensing. Collectively, this work identifies essential molecular drivers of the cellular interactions underlying lung fibrosis.
{"title":"Spatial transcriptomics reveals altered communities and drivers of aberrant epithelia and pro-fibrotic fibroblasts in interstitial lung diseases.","authors":"Alok Jaiswal, Tristan Kooistra, Vladislav Pokatayev, Hélder N Bastos, Rita F Santos, Tresa R Sarraf, Åsa Segerstolpe, Crystal Lin, Liat Amir-Zilberstein, Shaina Twardus, Kevin Shannon, Shane P Murphy, Rachel Knipe, Ingo K Ganzleben, Katharine E Black, Toni M Delorey, Daniel B Graham, Yin P Hung, Lida P Hariri, Jacques Deguine, Agostinho Carvalho, Benjamin D Medoff, Ramnik J Xavier","doi":"10.1016/j.xgen.2025.101066","DOIUrl":"10.1016/j.xgen.2025.101066","url":null,"abstract":"<p><p>Interstitial lung diseases (ILD) are characterized by fibrotic scarring of the lung parenchyma with remarkably unfavorable prognosis. Using single-nucleus RNA sequencing and spatial transcriptomics, we generated a comprehensive cellular network of the distal lung and its alterations in fibrosis. Integration with histopathology revealed that the transformation of normal parenchyma into fibrotic tissue is accompanied by ectopic bronchiolization and decellularization. Areas of active fibrosis were characterized by co-localization of pro-fibrotic CTHRC1-hi fibroblasts and aberrant transitional epithelial cells. We modeled this maladaptive differentiation of alveolar epithelial cells using organoids, demonstrating that all three pro-inflammatory ligands present in this pathogenic niche, TGF-β, IL-1β, and TNF-α, are jointly required for their induction. Additionally, we identified a requirement for the transcription factor NFATC4 during myofibroblast differentiation driven by soluble factors or mechanosensing. Collectively, this work identifies essential molecular drivers of the cellular interactions underlying lung fibrosis.</p>","PeriodicalId":72539,"journal":{"name":"Cell genomics","volume":" ","pages":"101066"},"PeriodicalIF":11.1,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146042223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Metagenomics has enabled the understanding of the microbial composition and functional potential in various environments. Using laser-induced forward transfer (LIFT) technology, we report high-quality microbial single-cell genomes or transcriptomes in complex samples such as mouse gut, human saliva, and tumor sections. Bacterial cells in close proximity to each other or to host cells could be directly analyzed using this single-cell approach. Bacterial cells in mice or human samples could be fluorescently labeled for single-cell visualization before collection. The high-quality single-cell transcriptome results allow us to delineate cell-fate commitment in Bacillus sporulation and preliminarily characterize gene expression from Bacteroides in a colorectal cancer sample. The method is scalable and precise and empowers insights about microbial populations and single-cell interactions with the host.
{"title":"Microbial single-cell omics in situ.","authors":"Xihong Lan, Qiaoxing Liang, Jinhua He, Jiayi Wu, Xiaoying Zhang, Fei Li, Lili Li, Guoping Zhao, Ruidong Guo, Huijue Jia","doi":"10.1016/j.xgen.2025.101128","DOIUrl":"10.1016/j.xgen.2025.101128","url":null,"abstract":"<p><p>Metagenomics has enabled the understanding of the microbial composition and functional potential in various environments. Using laser-induced forward transfer (LIFT) technology, we report high-quality microbial single-cell genomes or transcriptomes in complex samples such as mouse gut, human saliva, and tumor sections. Bacterial cells in close proximity to each other or to host cells could be directly analyzed using this single-cell approach. Bacterial cells in mice or human samples could be fluorescently labeled for single-cell visualization before collection. The high-quality single-cell transcriptome results allow us to delineate cell-fate commitment in Bacillus sporulation and preliminarily characterize gene expression from Bacteroides in a colorectal cancer sample. The method is scalable and precise and empowers insights about microbial populations and single-cell interactions with the host.</p>","PeriodicalId":72539,"journal":{"name":"Cell genomics","volume":" ","pages":"101128"},"PeriodicalIF":11.1,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146042260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1016/j.xgen.2025.101126
Aviya Litman, Zhicheng Pan, Ksenia Sokolova, Joyce Fang, Tess Marvin, Natalie Sauerwald, Christopher Y Park, Chandra L Theesfeld, Olga G Troyanskaya
In eukaryotes, most genes produce multiple transcript isoforms that diversify the transcriptome and proteome, serving as a key mechanism of functional regulation. Genetic variation can disrupt the RNA processing signals that shape isoform structure and abundance, yet modeling these effects at full-length isoform resolution remains challenging due to the complexity of transcript regulation. Here, we introduce Otari, an attention-based graph neural network framework trained on the human genomic sequence and long-read transcriptomes across 30 tissue types and brain regions. Otari predicts tissue-specific differential isoform abundance by integrating sequence-derived epigenetic and post-transcriptional signals, enabling isoform-resolved variant effect interpretation. Applied to large-scale variant datasets, including an autism cohort, Otari uncovers patterns of isoform dysregulation undetectable at the gene level, such as variant-driven perturbations in isoform abundance and microexon usage implicated in autism pathophysiology. We provide Otari as a resource for powering isoform-level analyses across tissues at scale.
{"title":"Variant-resolved prediction of context-specific isoform variation with a graph-based attention model.","authors":"Aviya Litman, Zhicheng Pan, Ksenia Sokolova, Joyce Fang, Tess Marvin, Natalie Sauerwald, Christopher Y Park, Chandra L Theesfeld, Olga G Troyanskaya","doi":"10.1016/j.xgen.2025.101126","DOIUrl":"10.1016/j.xgen.2025.101126","url":null,"abstract":"<p><p>In eukaryotes, most genes produce multiple transcript isoforms that diversify the transcriptome and proteome, serving as a key mechanism of functional regulation. Genetic variation can disrupt the RNA processing signals that shape isoform structure and abundance, yet modeling these effects at full-length isoform resolution remains challenging due to the complexity of transcript regulation. Here, we introduce Otari, an attention-based graph neural network framework trained on the human genomic sequence and long-read transcriptomes across 30 tissue types and brain regions. Otari predicts tissue-specific differential isoform abundance by integrating sequence-derived epigenetic and post-transcriptional signals, enabling isoform-resolved variant effect interpretation. Applied to large-scale variant datasets, including an autism cohort, Otari uncovers patterns of isoform dysregulation undetectable at the gene level, such as variant-driven perturbations in isoform abundance and microexon usage implicated in autism pathophysiology. We provide Otari as a resource for powering isoform-level analyses across tissues at scale.</p>","PeriodicalId":72539,"journal":{"name":"Cell genomics","volume":" ","pages":"101126"},"PeriodicalIF":11.1,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145994566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1016/j.xgen.2025.101134
Jill A Hollenbach
The immunoglobulin heavy chain constant (IGHC) locus houses genetic determinates of antibody function and specificity. In this issue of Cell Genomics, Jana et al. use long-read sequencing to characterize extensive inter-individual diversity in the IGHC region across ancestrally diverse populations, highlighting potential functional consequences.
{"title":"Uncovering diversity in the immunoglobulin heavy chain locus.","authors":"Jill A Hollenbach","doi":"10.1016/j.xgen.2025.101134","DOIUrl":"https://doi.org/10.1016/j.xgen.2025.101134","url":null,"abstract":"<p><p>The immunoglobulin heavy chain constant (IGHC) locus houses genetic determinates of antibody function and specificity. In this issue of Cell Genomics, Jana et al. use long-read sequencing to characterize extensive inter-individual diversity in the IGHC region across ancestrally diverse populations, highlighting potential functional consequences.</p>","PeriodicalId":72539,"journal":{"name":"Cell genomics","volume":"6 1","pages":"101134"},"PeriodicalIF":11.1,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145992024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14Epub Date: 2025-10-10DOI: 10.1016/j.xgen.2025.101036
Minwoo Pak, Mirca S Saurty-Seerunghen, Kellie Wise, Tsega-Ab Abera, Chhiring Lama, Neelang Parghi, Ted Kang, Xiaotian Sun, Qi Gao, Liming Bao, Mikhail Roshal, John N Allan, Richard R Furman, Luciano G Martelotto, Anna S Nam
Somatic evolution leads to clonal heterogeneity, which fuels cancer progression and therapy resistance. To decipher the consequences of clonal heterogeneity, we require a method that deconvolutes complex clonal architectures and their downstream transcriptional states. We developed Genotyping of Transcriptomes for multiple targets and sample types (GoT-Multi), a high-throughput, formalin-fixed paraffin-embedded (FFPE) tissue-compatible single-cell multi-omics for co-detection of multiple somatic genotypes and whole transcriptomes. We developed an ensemble-based machine learning pipeline to optimize genotyping. We applied GoT-Multi to frozen or FFPE samples of Richter transformation, a progression of chronic lymphocytic leukemia to therapy-resistant large B cell lymphoma. GoT-Multi detected heterogeneous cancer cell states with genotypic data of 27 mutations, enabling clonal architecture reconstruction linked with their transcriptional programs. Distinct subclonal genotypes, including therapy-resistant mutations, converged on an inflammatory state. Other subclones displayed enhanced proliferation and/or MYC program. Thus, GoT-Multi revealed that distinct genotypic identities may converge on similar transcriptional states to mediate therapy resistance.
{"title":"Co-mapping clonal and transcriptional heterogeneity in somatic evolution via GoT-Multi.","authors":"Minwoo Pak, Mirca S Saurty-Seerunghen, Kellie Wise, Tsega-Ab Abera, Chhiring Lama, Neelang Parghi, Ted Kang, Xiaotian Sun, Qi Gao, Liming Bao, Mikhail Roshal, John N Allan, Richard R Furman, Luciano G Martelotto, Anna S Nam","doi":"10.1016/j.xgen.2025.101036","DOIUrl":"10.1016/j.xgen.2025.101036","url":null,"abstract":"<p><p>Somatic evolution leads to clonal heterogeneity, which fuels cancer progression and therapy resistance. To decipher the consequences of clonal heterogeneity, we require a method that deconvolutes complex clonal architectures and their downstream transcriptional states. We developed Genotyping of Transcriptomes for multiple targets and sample types (GoT-Multi), a high-throughput, formalin-fixed paraffin-embedded (FFPE) tissue-compatible single-cell multi-omics for co-detection of multiple somatic genotypes and whole transcriptomes. We developed an ensemble-based machine learning pipeline to optimize genotyping. We applied GoT-Multi to frozen or FFPE samples of Richter transformation, a progression of chronic lymphocytic leukemia to therapy-resistant large B cell lymphoma. GoT-Multi detected heterogeneous cancer cell states with genotypic data of 27 mutations, enabling clonal architecture reconstruction linked with their transcriptional programs. Distinct subclonal genotypes, including therapy-resistant mutations, converged on an inflammatory state. Other subclones displayed enhanced proliferation and/or MYC program. Thus, GoT-Multi revealed that distinct genotypic identities may converge on similar transcriptional states to mediate therapy resistance.</p>","PeriodicalId":72539,"journal":{"name":"Cell genomics","volume":" ","pages":"101036"},"PeriodicalIF":11.1,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145276872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14Epub Date: 2025-12-02DOI: 10.1016/j.xgen.2025.101079
Zikun Yang, Lu Zhang, Xinrui Jiang, Xiangyu Yang, Kaiyue Ma, DongAhn Yoo, Yong Lu, Shilong Zhang, Jieyi Chen, Yanhong Nie, Xinyan Bian, Junmin Han, Lianting Fu, Juan Zhang, Mario Ventura, Guojie Zhang, Qiang Sun, Evan E Eichler, Yafei Mao
All great apes differ karyotypically from humans due to the fusion of chromosomes 2a and 2b, resulting in human chromosome 2. Here, we show that the fusion was associated with multiple pericentric inversions, segmental duplications (SDs), and the turnover of subterminal repetitive DNA. We characterized the fusion site at the single-base-pair resolution and identified three distinct SDs that originated more than 5 million years ago. These three distinct SDs were differentially distributed among African great apes as a result of incomplete lineage sorting (ILS) and lineage-specific duplication. One of these SDs shares homology to a hypomethylated SD spacer sequence present in the subterminal heterochromatin of Pan but is completely absent subtelomerically in both humans and orangutans. CRISPR-Cas9-mediated depletion of the fusion site in human neural progenitor cells alters the expression of genes, indicating a potential regulatory consequence to this human-specific karyotypic change. Overall, this study offers insights into how complex regions subject to ILS may contribute to speciation.
{"title":"Incomplete lineage sorting of segmental duplications defines the human chromosome 2 fusion site early during African great ape speciation.","authors":"Zikun Yang, Lu Zhang, Xinrui Jiang, Xiangyu Yang, Kaiyue Ma, DongAhn Yoo, Yong Lu, Shilong Zhang, Jieyi Chen, Yanhong Nie, Xinyan Bian, Junmin Han, Lianting Fu, Juan Zhang, Mario Ventura, Guojie Zhang, Qiang Sun, Evan E Eichler, Yafei Mao","doi":"10.1016/j.xgen.2025.101079","DOIUrl":"10.1016/j.xgen.2025.101079","url":null,"abstract":"<p><p>All great apes differ karyotypically from humans due to the fusion of chromosomes 2a and 2b, resulting in human chromosome 2. Here, we show that the fusion was associated with multiple pericentric inversions, segmental duplications (SDs), and the turnover of subterminal repetitive DNA. We characterized the fusion site at the single-base-pair resolution and identified three distinct SDs that originated more than 5 million years ago. These three distinct SDs were differentially distributed among African great apes as a result of incomplete lineage sorting (ILS) and lineage-specific duplication. One of these SDs shares homology to a hypomethylated SD spacer sequence present in the subterminal heterochromatin of Pan but is completely absent subtelomerically in both humans and orangutans. CRISPR-Cas9-mediated depletion of the fusion site in human neural progenitor cells alters the expression of genes, indicating a potential regulatory consequence to this human-specific karyotypic change. Overall, this study offers insights into how complex regions subject to ILS may contribute to speciation.</p>","PeriodicalId":72539,"journal":{"name":"Cell genomics","volume":" ","pages":"101079"},"PeriodicalIF":11.1,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145672800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14Epub Date: 2025-10-10DOI: 10.1016/j.xgen.2025.101037
Pavla Navratilova, Simon Pavlu, Zihao Zhu, Zuzana Tulpova, Ondrej Kopecky, Petr Novak, Nils Stein, Hana Simkova
Regulation of transcription initiation is the ground level of modulating gene expression during plant development. This process relies on interactions between transcription factors and cis-regulatory elements (CREs), which become promising targets for crop bioengineering. To annotate CREs in the barley genome and understand mechanisms of distal regulation, we profiled several epigenetic features across three stages of barley embryo and leaves and performed HiChIP to identify activating and repressive genomic interactions. Using machine learning, we integrated the data into seven chromatin states, predicting ∼77,000 CRE candidates, collectively representing 1.43% of the barley genome. Identified genomic interactions, often spanning multiple genes, linked thousands of predicted CREs with their putative targets and revealed notably frequent promoter-promoter contacts. Using the LEA gene family as an example, we discuss possible roles of these interactions in transcription regulation. On the Vrn3 gene, we demonstrate the potential of our datasets to predict CREs for other developmental stages.
{"title":"Epigenome and interactome profiling uncovers principles of distal regulation in the barley genome.","authors":"Pavla Navratilova, Simon Pavlu, Zihao Zhu, Zuzana Tulpova, Ondrej Kopecky, Petr Novak, Nils Stein, Hana Simkova","doi":"10.1016/j.xgen.2025.101037","DOIUrl":"10.1016/j.xgen.2025.101037","url":null,"abstract":"<p><p>Regulation of transcription initiation is the ground level of modulating gene expression during plant development. This process relies on interactions between transcription factors and cis-regulatory elements (CREs), which become promising targets for crop bioengineering. To annotate CREs in the barley genome and understand mechanisms of distal regulation, we profiled several epigenetic features across three stages of barley embryo and leaves and performed HiChIP to identify activating and repressive genomic interactions. Using machine learning, we integrated the data into seven chromatin states, predicting ∼77,000 CRE candidates, collectively representing 1.43% of the barley genome. Identified genomic interactions, often spanning multiple genes, linked thousands of predicted CREs with their putative targets and revealed notably frequent promoter-promoter contacts. Using the LEA gene family as an example, we discuss possible roles of these interactions in transcription regulation. On the Vrn3 gene, we demonstrate the potential of our datasets to predict CREs for other developmental stages.</p>","PeriodicalId":72539,"journal":{"name":"Cell genomics","volume":" ","pages":"101037"},"PeriodicalIF":11.1,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145276854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}