Pub Date : 2026-02-28DOI: 10.1186/s13059-026-04015-z
Lieke Michielsen, Justine Hsu, Anoushka Joglekar, Natan Belchikov, Marcel J T Reinders, Hagen U Tilgner, Ahmed Mahfouz
Background: Alternative splicing contributes to molecular diversity across brain cell types. RNA-binding proteins (RBPs) regulate splicing, but the genome-wide mechanisms underlying cell-type-specific splicing remain poorly understood.
Results: Here, we want to unravel cell-type-specific splicing mechanisms by using RBP binding sites and/or the genomic sequence to predict exon inclusion in neurons and glia as measured by long-read single-cell data in the human hippocampus and frontal cortex. We found that exon inclusion of variable exons is harder to predict in neurons compared to glia in both brain regions. Comparing neurons and glia, the position of RBP binding sites in alternatively spliced exons in neurons differ more from non-variable exons indicating distinct splicing mechanisms. Model interpretation pinpointed RBPs, including QKI, potentially regulating alternative splicing between neurons and glia. Finally, we accurately predict and prioritize the effect of splicing QTLs.
Conclusions: Our results indicate that the splicing mechanisms in variable exons in neurons diverged more from the standard mechanisms. Splicing in neurons might be less sequence-dependent and influenced more by, for instance, chromatin accessibility or methylation. Taken together, these results highlight new insights into the mechanisms regulating cell-type-specific alternative splicing in the brain.
{"title":"Decoding exon inclusion in the human brain reveals more divergent splicing mechanisms in neurons than glia.","authors":"Lieke Michielsen, Justine Hsu, Anoushka Joglekar, Natan Belchikov, Marcel J T Reinders, Hagen U Tilgner, Ahmed Mahfouz","doi":"10.1186/s13059-026-04015-z","DOIUrl":"https://doi.org/10.1186/s13059-026-04015-z","url":null,"abstract":"<p><strong>Background: </strong>Alternative splicing contributes to molecular diversity across brain cell types. RNA-binding proteins (RBPs) regulate splicing, but the genome-wide mechanisms underlying cell-type-specific splicing remain poorly understood.</p><p><strong>Results: </strong>Here, we want to unravel cell-type-specific splicing mechanisms by using RBP binding sites and/or the genomic sequence to predict exon inclusion in neurons and glia as measured by long-read single-cell data in the human hippocampus and frontal cortex. We found that exon inclusion of variable exons is harder to predict in neurons compared to glia in both brain regions. Comparing neurons and glia, the position of RBP binding sites in alternatively spliced exons in neurons differ more from non-variable exons indicating distinct splicing mechanisms. Model interpretation pinpointed RBPs, including QKI, potentially regulating alternative splicing between neurons and glia. Finally, we accurately predict and prioritize the effect of splicing QTLs.</p><p><strong>Conclusions: </strong>Our results indicate that the splicing mechanisms in variable exons in neurons diverged more from the standard mechanisms. Splicing in neurons might be less sequence-dependent and influenced more by, for instance, chromatin accessibility or methylation. Taken together, these results highlight new insights into the mechanisms regulating cell-type-specific alternative splicing in the brain.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":" ","pages":""},"PeriodicalIF":12.3,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147322400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-27DOI: 10.1186/s13059-026-04014-0
Kunpeng Li, Jie Li, Ran Zhao, Liuyu Qin, Yu Wang, Jieling Ren, Lei Ke, Jianyu Wang, Xin Yi, Yue Zhou, Yuannian Jiao
Background: Aristolochia fimbriata (A. fimbriata), a magnoliid species similar to Amborella trichopoda, has not undergone additional whole genome duplications since the origin of extant flowering plants. Due to its low genetic redundancy and suitability for large-scale cultivation, A. fimbriata emerges as an exceptional reference and potential model species for comparative and functional genomic studies of angiosperm evolution.
Results: Here, we present a complete telomere-to-telomere (T2T) genome assembly of A. fimbriata and characterize its centromeric architecture and epigenetic landscape. Our analysis reveals remarkably short (34-bp) and highly homogenized satellite monomers in its centromeric regions. Furthermore, we identify approximately 1,020 topologically associating domain-like structures and 23,852 non-redundant accessible chromatin regions. Notably, over 50% of accessible chromatin regions participate in long-range chromatin loops that bypass at least one intervening gene, suggesting widespread distal gene regulation in this species. We also demonstrate that an expanded downstream regulatory network of the floral B-class gene APETALA3 (AP3) may contribute to the highly specialized floral features in A. fimbriata.
Conclusion: Our study not only elucidates the unique centromeric organization and three-dimensional epigenomic architecture of A. fimbriata, but also provides valuable genomic resources for investigating how regulatory network evolution drives phenotypic innovation in flowering plants.
{"title":"Centromere organization and epigenetic regulation in Aristolochia fimbriata.","authors":"Kunpeng Li, Jie Li, Ran Zhao, Liuyu Qin, Yu Wang, Jieling Ren, Lei Ke, Jianyu Wang, Xin Yi, Yue Zhou, Yuannian Jiao","doi":"10.1186/s13059-026-04014-0","DOIUrl":"https://doi.org/10.1186/s13059-026-04014-0","url":null,"abstract":"<p><strong>Background: </strong>Aristolochia fimbriata (A. fimbriata), a magnoliid species similar to Amborella trichopoda, has not undergone additional whole genome duplications since the origin of extant flowering plants. Due to its low genetic redundancy and suitability for large-scale cultivation, A. fimbriata emerges as an exceptional reference and potential model species for comparative and functional genomic studies of angiosperm evolution.</p><p><strong>Results: </strong>Here, we present a complete telomere-to-telomere (T2T) genome assembly of A. fimbriata and characterize its centromeric architecture and epigenetic landscape. Our analysis reveals remarkably short (34-bp) and highly homogenized satellite monomers in its centromeric regions. Furthermore, we identify approximately 1,020 topologically associating domain-like structures and 23,852 non-redundant accessible chromatin regions. Notably, over 50% of accessible chromatin regions participate in long-range chromatin loops that bypass at least one intervening gene, suggesting widespread distal gene regulation in this species. We also demonstrate that an expanded downstream regulatory network of the floral B-class gene APETALA3 (AP3) may contribute to the highly specialized floral features in A. fimbriata.</p><p><strong>Conclusion: </strong>Our study not only elucidates the unique centromeric organization and three-dimensional epigenomic architecture of A. fimbriata, but also provides valuable genomic resources for investigating how regulatory network evolution drives phenotypic innovation in flowering plants.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":" ","pages":""},"PeriodicalIF":12.3,"publicationDate":"2026-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147318811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-27DOI: 10.1186/s13059-026-04013-1
Tauras P Vilgalys, Jordan A Anderson, Arielle S Fogel, Dana Lin, Elizabeth A Archie, Susan C Alberts, Jenny Tung
Background: Hybrid zones play a central role in evolutionary biology because they serve as natural laboratories for studying how traits and taxa diverge. Changes in gene regulation make important contributions to this process. However, the degree to which admixture shapes gene regulatory variation in hybrid populations remains poorly understood. Here, we combine genome-wide resequencing and DNA methylation data from 295 hybrid baboons-members of a single, intensively studied natural population-to investigate how admixture affects the genetic architecture of this important epigenetic mark.
Results: We find that local genetic ancestry frequently predicts DNA methylation levels and recapitulates differences between the parent species. By performing methylation quantitative trait locus mapping, we show that these differences predominantly arise due to evolved differences in allele frequencies. Thus, admixture in the hybrid population increases variance in DNA methylation, including by introducing genetic variants affecting DNA methylation that would otherwise be invariant. Finally, we integrate massively parallel reporter assay data to show that admixture-derived variation in DNA methylation alters enhancer activity and gene expression.
Conclusions: Together, these results demonstrate how admixture can meaningfully alter the genetic architecture of gene regulatory traits in natural hybrid zones. They also suggest that the genetic architecture of DNA methylation is conserved across closely related primates, suggesting that genetic effects on gene regulation may remain stable over timescales that range into the millions of years.
{"title":"Admixture influences the genetic architecture of DNA methylation in a wild primate hybrid zone.","authors":"Tauras P Vilgalys, Jordan A Anderson, Arielle S Fogel, Dana Lin, Elizabeth A Archie, Susan C Alberts, Jenny Tung","doi":"10.1186/s13059-026-04013-1","DOIUrl":"https://doi.org/10.1186/s13059-026-04013-1","url":null,"abstract":"<p><strong>Background: </strong>Hybrid zones play a central role in evolutionary biology because they serve as natural laboratories for studying how traits and taxa diverge. Changes in gene regulation make important contributions to this process. However, the degree to which admixture shapes gene regulatory variation in hybrid populations remains poorly understood. Here, we combine genome-wide resequencing and DNA methylation data from 295 hybrid baboons-members of a single, intensively studied natural population-to investigate how admixture affects the genetic architecture of this important epigenetic mark.</p><p><strong>Results: </strong>We find that local genetic ancestry frequently predicts DNA methylation levels and recapitulates differences between the parent species. By performing methylation quantitative trait locus mapping, we show that these differences predominantly arise due to evolved differences in allele frequencies. Thus, admixture in the hybrid population increases variance in DNA methylation, including by introducing genetic variants affecting DNA methylation that would otherwise be invariant. Finally, we integrate massively parallel reporter assay data to show that admixture-derived variation in DNA methylation alters enhancer activity and gene expression.</p><p><strong>Conclusions: </strong>Together, these results demonstrate how admixture can meaningfully alter the genetic architecture of gene regulatory traits in natural hybrid zones. They also suggest that the genetic architecture of DNA methylation is conserved across closely related primates, suggesting that genetic effects on gene regulation may remain stable over timescales that range into the millions of years.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":" ","pages":""},"PeriodicalIF":12.3,"publicationDate":"2026-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147318761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-26DOI: 10.1186/s13059-026-04010-4
Shilpa Rao, Aden Y Le, Logan Persyn, Can Cenik
Background: Translational buffering refers to the regulation of ribosome occupancy to offset the effects of transcriptional variation. While previous studies have primarily investigated translational buffering in yeast under genetic variation or environmental stress, it remains unclear how widespread this is across mammalian genes in various cellular contexts.
Results: We performed a uniform analysis of 1,515 matched ribosome profiling and RNA-seq datasets from humans and mice. This resource enabled us to assess translational buffering through comparative analysis of variation in ribosome occupancy and RNA expression, and by examining the relationship between mRNA abundance and translation efficiency. We found that translational buffering is partly conserved between humans and mice; homologous genes showed moderate cross-species correlation in mRNA-translation efficiency relationships and strong enrichment of shared buffered genes, particularly those encoding ribosomal, RNA-binding, and proteasomal proteins. Although identified buffered genes associate with specific sequence features, these alone are insufficient to predict translational buffering, highlighting the importance of cellular context. Genes exhibiting translational buffering show lower variation in protein abundance in cancer cell lines and tissues. We also observed that translationally buffered genes are more likely to be haploinsufficient and triplosensitive, suggesting a demand for stringent dosage limits.
Conclusions: We hypothesize two models of translational buffering, namely the "differential accessibility model" and the "translation initiation rate model", suggesting that different transcripts align with one or the other. Our study explores the translational buffering potential of genes across diverse conditions, elucidates their distinctive features, and provides insights into the mechanisms driving this effect.
{"title":"Translational buffering tunes gene expression in mice and humans.","authors":"Shilpa Rao, Aden Y Le, Logan Persyn, Can Cenik","doi":"10.1186/s13059-026-04010-4","DOIUrl":"10.1186/s13059-026-04010-4","url":null,"abstract":"<p><strong>Background: </strong>Translational buffering refers to the regulation of ribosome occupancy to offset the effects of transcriptional variation. While previous studies have primarily investigated translational buffering in yeast under genetic variation or environmental stress, it remains unclear how widespread this is across mammalian genes in various cellular contexts.</p><p><strong>Results: </strong>We performed a uniform analysis of 1,515 matched ribosome profiling and RNA-seq datasets from humans and mice. This resource enabled us to assess translational buffering through comparative analysis of variation in ribosome occupancy and RNA expression, and by examining the relationship between mRNA abundance and translation efficiency. We found that translational buffering is partly conserved between humans and mice; homologous genes showed moderate cross-species correlation in mRNA-translation efficiency relationships and strong enrichment of shared buffered genes, particularly those encoding ribosomal, RNA-binding, and proteasomal proteins. Although identified buffered genes associate with specific sequence features, these alone are insufficient to predict translational buffering, highlighting the importance of cellular context. Genes exhibiting translational buffering show lower variation in protein abundance in cancer cell lines and tissues. We also observed that translationally buffered genes are more likely to be haploinsufficient and triplosensitive, suggesting a demand for stringent dosage limits.</p><p><strong>Conclusions: </strong>We hypothesize two models of translational buffering, namely the \"differential accessibility model\" and the \"translation initiation rate model\", suggesting that different transcripts align with one or the other. Our study explores the translational buffering potential of genes across diverse conditions, elucidates their distinctive features, and provides insights into the mechanisms driving this effect.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":" ","pages":""},"PeriodicalIF":12.3,"publicationDate":"2026-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147311401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-26DOI: 10.1186/s13059-026-04008-y
Simon Chasles, Zakary Gaillard-Duchassin, Jordan Quenneville, Mélanie Lemaire, Etienne Gagnon, François Major
RIMap-RISC is a web-accessible database for transcriptome-wide modeling of human microRNA (miRNA) targeting. It computes plausible transcript-miRNA interactions and records their position, duplex secondary structure, free energy, site classification, dissociation constant, target accessibility, and evolutionary conservation. RIMap-RISC supports transcript-wide queries and allows users to explore and export interaction data through an interactive interface or RESTful API programmatic access. Unlike existing tools, RIMap-RISC integrates duplex-structure prediction within a biophysical framework modeling the bipartite architecture of RISC, accommodating a bridge between seed and supplementary pairing. A novel, unambiguous, miRNA-centric nomenclature for interaction types is also introduced.
{"title":"RIMap-RISC: a transcriptome-wide database of structurally modeled human microRNA interactions.","authors":"Simon Chasles, Zakary Gaillard-Duchassin, Jordan Quenneville, Mélanie Lemaire, Etienne Gagnon, François Major","doi":"10.1186/s13059-026-04008-y","DOIUrl":"https://doi.org/10.1186/s13059-026-04008-y","url":null,"abstract":"<p><p>RIMap-RISC is a web-accessible database for transcriptome-wide modeling of human microRNA (miRNA) targeting. It computes plausible transcript-miRNA interactions and records their position, duplex secondary structure, free energy, site classification, dissociation constant, target accessibility, and evolutionary conservation. RIMap-RISC supports transcript-wide queries and allows users to explore and export interaction data through an interactive interface or RESTful API programmatic access. Unlike existing tools, RIMap-RISC integrates duplex-structure prediction within a biophysical framework modeling the bipartite architecture of RISC, accommodating a bridge between seed and supplementary pairing. A novel, unambiguous, miRNA-centric nomenclature for interaction types is also introduced.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":" ","pages":""},"PeriodicalIF":12.3,"publicationDate":"2026-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147311394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-25DOI: 10.1186/s13059-026-03988-1
David Wissel, Madison M Mehlferber, Khue M Nguyen, Vasilii Pavelko, Elizabeth Tseng, Mark D Robinson, Gloria M Sheynkman
Background: The assembly of fragmented RNA-sequencing reads into complete transcripts is error-prone, particularly for genes with complex splicing, resulting in ambiguity in transcript discovery and quantification. PacBio long-read RNA sequencing resolves transcripts with greater clarity than short-read technologies. PacBio Kinnex employs a cDNA concatenation approach that increases read yield on average by 8-fold relative to previous protocols. However, its quantitative performance remains under-evaluated at scale.
Results: Here, we benchmark the high-throughput PacBio Kinnex platform against Illumina short-read RNA-seq using matched, deeply sequenced datasets across a time course of endothelial cell differentiation. Compared to Illumina, Kinnex achieves comparable gene-level quantification and more accurate transcript discovery and transcript quantification. While Illumina detects more transcripts overall, many reflect potentially unstable or ambiguous estimates in complex genes. Kinnex largely avoids these issues, producing more reliable differential transcript expression calls, despite a mild bias against short transcripts (shorter than 1.25 kb). When correcting Illumina for inferential variability, Kinnex and Illumina quantifications are highly concordant, demonstrating equivalent performance. We also benchmark long-read tools, nominating Oarfish as the most efficient for our Kinnex data.
Conclusions: Together, our results establish Kinnex as a reliable platform for full-length transcript quantification.
{"title":"A systematic benchmark of high-accuracy PacBio long-read RNA sequencing for transcript-level quantification.","authors":"David Wissel, Madison M Mehlferber, Khue M Nguyen, Vasilii Pavelko, Elizabeth Tseng, Mark D Robinson, Gloria M Sheynkman","doi":"10.1186/s13059-026-03988-1","DOIUrl":"10.1186/s13059-026-03988-1","url":null,"abstract":"<p><strong>Background: </strong>The assembly of fragmented RNA-sequencing reads into complete transcripts is error-prone, particularly for genes with complex splicing, resulting in ambiguity in transcript discovery and quantification. PacBio long-read RNA sequencing resolves transcripts with greater clarity than short-read technologies. PacBio Kinnex employs a cDNA concatenation approach that increases read yield on average by 8-fold relative to previous protocols. However, its quantitative performance remains under-evaluated at scale.</p><p><strong>Results: </strong>Here, we benchmark the high-throughput PacBio Kinnex platform against Illumina short-read RNA-seq using matched, deeply sequenced datasets across a time course of endothelial cell differentiation. Compared to Illumina, Kinnex achieves comparable gene-level quantification and more accurate transcript discovery and transcript quantification. While Illumina detects more transcripts overall, many reflect potentially unstable or ambiguous estimates in complex genes. Kinnex largely avoids these issues, producing more reliable differential transcript expression calls, despite a mild bias against short transcripts (shorter than 1.25 kb). When correcting Illumina for inferential variability, Kinnex and Illumina quantifications are highly concordant, demonstrating equivalent performance. We also benchmark long-read tools, nominating Oarfish as the most efficient for our Kinnex data.</p><p><strong>Conclusions: </strong>Together, our results establish Kinnex as a reliable platform for full-length transcript quantification.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":" ","pages":""},"PeriodicalIF":12.3,"publicationDate":"2026-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147311391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-25DOI: 10.1186/s13059-026-04012-2
Subina Mehta, Reid Wagner, Katherine T Do, James E Johnson, Fengchao Yu, Tyler Jubenville, Kyle Richards, Suzanne Coleman, Flavia E Popescu, Alexey I Nesvizhskii, David A Largaespada, Pratik D Jagtap, Timothy J Griffin
Characterizing tumor-specific neoantigen peptides, derived from genomic or transcriptomic aberrations and presented to the immune system, is critical for immuno-oncology studies. To this end, the modular iPepGen immunopeptidogenomics pipeline provides these functions: (1) Neoantigen prediction and protein database generation from genomic or transcriptomic sequencing data; (2) Peptide identification (3) Verification from immunopeptidomic mass spectral data; (4) Neoantigen classification and visualization; (5) Candidate prioritization for further study. Easy access via a publicly available, scalable cloud-based gateway coupled with online, interactive training materials streamlines the adoption by cancer researchers who require immunopeptidogenomic analysis tools but lack advanced computational expertise and resources.
{"title":"iPepGen: a modular, immunopeptidogenomic analysis pipeline for discovery, verification, and prioritization of cancer peptide neoantigen candidates.","authors":"Subina Mehta, Reid Wagner, Katherine T Do, James E Johnson, Fengchao Yu, Tyler Jubenville, Kyle Richards, Suzanne Coleman, Flavia E Popescu, Alexey I Nesvizhskii, David A Largaespada, Pratik D Jagtap, Timothy J Griffin","doi":"10.1186/s13059-026-04012-2","DOIUrl":"10.1186/s13059-026-04012-2","url":null,"abstract":"<p><p>Characterizing tumor-specific neoantigen peptides, derived from genomic or transcriptomic aberrations and presented to the immune system, is critical for immuno-oncology studies. To this end, the modular iPepGen immunopeptidogenomics pipeline provides these functions: (1) Neoantigen prediction and protein database generation from genomic or transcriptomic sequencing data; (2) Peptide identification (3) Verification from immunopeptidomic mass spectral data; (4) Neoantigen classification and visualization; (5) Candidate prioritization for further study. Easy access via a publicly available, scalable cloud-based gateway coupled with online, interactive training materials streamlines the adoption by cancer researchers who require immunopeptidogenomic analysis tools but lack advanced computational expertise and resources.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":" ","pages":""},"PeriodicalIF":12.3,"publicationDate":"2026-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147311388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-25DOI: 10.1186/s13059-026-03998-z
Yang Liu, Lu Zhou, Xiawei Du, Ruikun He, Xuguang Zhang, Rongbo Shen, Yixue Li
Background: The surge in single-cell omics data exposes limitations in traditional, manually defined analysis workflows. AI agents offer a paradigm shift, enabling adaptive planning, executable code generation, traceable decisions, and real-time knowledge fusion. However, the lack of a comprehensive benchmark critically hinders progress.
Results: We introduce a novel benchmarking evaluation system to rigorously assess agent capabilities in single-cell omics analysis. This system comprises: a unified platform compatible with diverse agent frameworks and LLMs; multidimensional metrics assessing cognitive program synthesis, collaboration, execution efficiency, bioinformatics knowledge integration, and task completion quality; and 50 diverse real-world single-cell omics analysis tasks spanning multi-omics, species, and sequencing technologies. Our evaluation reveals that Grok3-beta achieves state-of-the-art performance among tested agent frameworks. Multi-agent frameworks significantly enhance collaboration and execution efficiency over single-agent approaches through specialized role division. Attribution analyses of agent capabilities identify that high-quality code generation is crucial for task success, and self-reflection has the most significant overall impact, followed by retrieval-augmented generation (RAG) and planning.
Conclusions: This work highlights persistent challenges in code generation, long-context handling, and context-aware knowledge retrieval, providing a critical empirical foundation and best practices for developing robust AI agents in computational biology.
{"title":"Benchmarking LLM-based agents for single-cell omics analysis.","authors":"Yang Liu, Lu Zhou, Xiawei Du, Ruikun He, Xuguang Zhang, Rongbo Shen, Yixue Li","doi":"10.1186/s13059-026-03998-z","DOIUrl":"https://doi.org/10.1186/s13059-026-03998-z","url":null,"abstract":"<p><strong>Background: </strong>The surge in single-cell omics data exposes limitations in traditional, manually defined analysis workflows. AI agents offer a paradigm shift, enabling adaptive planning, executable code generation, traceable decisions, and real-time knowledge fusion. However, the lack of a comprehensive benchmark critically hinders progress.</p><p><strong>Results: </strong>We introduce a novel benchmarking evaluation system to rigorously assess agent capabilities in single-cell omics analysis. This system comprises: a unified platform compatible with diverse agent frameworks and LLMs; multidimensional metrics assessing cognitive program synthesis, collaboration, execution efficiency, bioinformatics knowledge integration, and task completion quality; and 50 diverse real-world single-cell omics analysis tasks spanning multi-omics, species, and sequencing technologies. Our evaluation reveals that Grok3-beta achieves state-of-the-art performance among tested agent frameworks. Multi-agent frameworks significantly enhance collaboration and execution efficiency over single-agent approaches through specialized role division. Attribution analyses of agent capabilities identify that high-quality code generation is crucial for task success, and self-reflection has the most significant overall impact, followed by retrieval-augmented generation (RAG) and planning.</p><p><strong>Conclusions: </strong>This work highlights persistent challenges in code generation, long-context handling, and context-aware knowledge retrieval, providing a critical empirical foundation and best practices for developing robust AI agents in computational biology.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":" ","pages":""},"PeriodicalIF":12.3,"publicationDate":"2026-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147311407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-25DOI: 10.1186/s13059-026-04006-0
Shuo Li, Weihua Zeng, Wenyuan Li, Chun-Chi Liu, Yonggang Zhou, Xiaohui Ni, Mary L Stackpole, Angela H Yeh, Andrew Melehy, David S Lu, Steven S Raman, William Hsu, Lopa Mishra, Kirti Shetty, Benjamin Tran, Megumi Yokomizo, Preeti Ahuja, Yazhen Zhu, Hsian-Rong Tseng, Denise R Aberle, Vatche G Agopian, Steven-Huy B Han, Samuel W French, Steven M Dubinett, Xianghong Jasmine Zhou, Wing Hung Wong
Background: Machine learning models in biomedical research are often hindered by demographic imbalances in clinical datasets, leading to biased predictions that disadvantage minority populations. Existing bias-correction methods face limitations in handling the heterogeneity of biomedical data and the complexity of demographic influences.
Results: We present DeBias, a computational framework for mitigating demographic biases in high-dimensional biomedical datasets. DeBias identifies and removes bias-associated subspaces from the feature space using control samples, enabling global correction of demographic distortions while preserving disease-specific signals. To evaluate its effectiveness, we apply DeBias to cell-free DNA methylation data for cancer detection. DeBias achieves a significant reduction in the number of features exhibiting demographic bias and outperforms existing methods in improving cancer detection performance for minority populations. Performance gains are validated in independent cohorts, highlighting the robustness of the approach.
Conclusions: DeBias offers an effective and generalizable strategy for correcting demographic biases in biomedical machine learning. It represents a step toward more equitable machine learning models that can deliver reliable and unbiased predictions across diverse patient populations.
{"title":"Reducing demographic bias in biomedical machine learning for cancer detection using cfDNA methylation.","authors":"Shuo Li, Weihua Zeng, Wenyuan Li, Chun-Chi Liu, Yonggang Zhou, Xiaohui Ni, Mary L Stackpole, Angela H Yeh, Andrew Melehy, David S Lu, Steven S Raman, William Hsu, Lopa Mishra, Kirti Shetty, Benjamin Tran, Megumi Yokomizo, Preeti Ahuja, Yazhen Zhu, Hsian-Rong Tseng, Denise R Aberle, Vatche G Agopian, Steven-Huy B Han, Samuel W French, Steven M Dubinett, Xianghong Jasmine Zhou, Wing Hung Wong","doi":"10.1186/s13059-026-04006-0","DOIUrl":"https://doi.org/10.1186/s13059-026-04006-0","url":null,"abstract":"<p><strong>Background: </strong>Machine learning models in biomedical research are often hindered by demographic imbalances in clinical datasets, leading to biased predictions that disadvantage minority populations. Existing bias-correction methods face limitations in handling the heterogeneity of biomedical data and the complexity of demographic influences.</p><p><strong>Results: </strong>We present DeBias, a computational framework for mitigating demographic biases in high-dimensional biomedical datasets. DeBias identifies and removes bias-associated subspaces from the feature space using control samples, enabling global correction of demographic distortions while preserving disease-specific signals. To evaluate its effectiveness, we apply DeBias to cell-free DNA methylation data for cancer detection. DeBias achieves a significant reduction in the number of features exhibiting demographic bias and outperforms existing methods in improving cancer detection performance for minority populations. Performance gains are validated in independent cohorts, highlighting the robustness of the approach.</p><p><strong>Conclusions: </strong>DeBias offers an effective and generalizable strategy for correcting demographic biases in biomedical machine learning. It represents a step toward more equitable machine learning models that can deliver reliable and unbiased predictions across diverse patient populations.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":" ","pages":""},"PeriodicalIF":12.3,"publicationDate":"2026-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147285940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-21DOI: 10.1186/s13059-026-03999-y
Ali Osman Berk Şapcı, Siavash Mirarab
Comparing each sequencing read in a sample to a reference database is a fundamental step in wide-ranging applications. Results of these comparisons can enable phylogenetic characterization. However, phylogenetic placement is currently only possible at scale for marker genes, a small fraction of the genome. We introduce krepp, an alignment-free k-mer-based method that enables placing reads from anywhere on the genome on an ultra-large reference phylogeny (e.g., 123,853 leaves). We show that krepp is scalable and computes accurate distances that approximate those using alignments, leading to accurate placements. These precise phylogenetic identifications improve our ability to compare and characterize metagenomic samples.
{"title":"krepp: a k-mer-based maximum pseudo-likelihood method for estimating read distances and genome-wide phylogenetic placement.","authors":"Ali Osman Berk Şapcı, Siavash Mirarab","doi":"10.1186/s13059-026-03999-y","DOIUrl":"10.1186/s13059-026-03999-y","url":null,"abstract":"<p><p>Comparing each sequencing read in a sample to a reference database is a fundamental step in wide-ranging applications. Results of these comparisons can enable phylogenetic characterization. However, phylogenetic placement is currently only possible at scale for marker genes, a small fraction of the genome. We introduce krepp, an alignment-free k-mer-based method that enables placing reads from anywhere on the genome on an ultra-large reference phylogeny (e.g., 123,853 leaves). We show that krepp is scalable and computes accurate distances that approximate those using alignments, leading to accurate placements. These precise phylogenetic identifications improve our ability to compare and characterize metagenomic samples.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":" ","pages":""},"PeriodicalIF":12.3,"publicationDate":"2026-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146776886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}