Pub Date : 2026-02-17DOI: 10.1016/j.xgen.2026.101165
Cheuk-Ting Law, Kathleen H Burns
Long interspersed element-1 (LINE-1, L1) retrotransposons are the most abundant protein-coding transposable elements (TEs) in mammalian genomes and have shaped genome content over 170 million years of evolution. LINE-1 is self-propagating and mobilizes other sequences, including Alu elements. Occasionally, LINE-1 forms chimeric insertions with non-coding RNAs and mRNAs, but there are no comprehensive catalogs of LINE-1 chimeras. To address this, we developed timing mobile element insertions (TiMEstamp), a computational pipeline that leverages multiple sequence alignments (MSAs) to estimate the age of LINE-1 insertions and identify candidate chimeric insertions where an adjacent sequence arrives contemporaneously. With this pipeline, we discovered new chimeric insertions involving small RNAs, Alu elements, and mRNA fragments. Additionally, we saw evidence that LINE-1 loci with defunct promoters can acquire regulatory elements from nearby genes to restore expression and retrotransposition activity. These discoveries highlight the recombinatory potential of LINE-1 RNA with implications for genome evolution, TE domestication, and somatic retrotransposition.
{"title":"Comparative genomics reveals LINE-1 recombination with diverse RNAs.","authors":"Cheuk-Ting Law, Kathleen H Burns","doi":"10.1016/j.xgen.2026.101165","DOIUrl":"10.1016/j.xgen.2026.101165","url":null,"abstract":"<p><p>Long interspersed element-1 (LINE-1, L1) retrotransposons are the most abundant protein-coding transposable elements (TEs) in mammalian genomes and have shaped genome content over 170 million years of evolution. LINE-1 is self-propagating and mobilizes other sequences, including Alu elements. Occasionally, LINE-1 forms chimeric insertions with non-coding RNAs and mRNAs, but there are no comprehensive catalogs of LINE-1 chimeras. To address this, we developed timing mobile element insertions (TiMEstamp), a computational pipeline that leverages multiple sequence alignments (MSAs) to estimate the age of LINE-1 insertions and identify candidate chimeric insertions where an adjacent sequence arrives contemporaneously. With this pipeline, we discovered new chimeric insertions involving small RNAs, Alu elements, and mRNA fragments. Additionally, we saw evidence that LINE-1 loci with defunct promoters can acquire regulatory elements from nearby genes to restore expression and retrotransposition activity. These discoveries highlight the recombinatory potential of LINE-1 RNA with implications for genome evolution, TE domestication, and somatic retrotransposition.</p>","PeriodicalId":72539,"journal":{"name":"Cell genomics","volume":" ","pages":"101165"},"PeriodicalIF":11.1,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146221859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-11Epub Date: 2025-11-19DOI: 10.1016/j.xgen.2025.101065
Tajda Klobučar, Jona Novljan, Ira A Iosub, Boštjan Kokot, Iztok Urbančič, D Marc Jones, Anob M Chakrabarti, Nicholas M Luscombe, Jernej Ule, Miha Modic
Complex RNA-protein networks play a pivotal role in the formation of many types of biomolecular condensates. How RNA features contribute to condensate formation, however, remains incompletely understood. Here, we integrate tailored transcriptomics assays to identify a distinct class of developmental condensation-prone RNAs termed "smOOPs" (semi-extractable, orthogonal-organic-phase-separation-enriched RNAs). These transcripts localize to larger intracellular foci, form denser RNA subnetworks than expected, and are heavily bound by RNA-binding proteins (RBPs). Using an explainable deep learning framework, we reveal that smOOPs harbor characteristic sequence composition, with lower sequence complexity, increased intramolecular folding, and specific RBP-binding patterns. Intriguingly, these RNAs encode proteins bearing extensive intrinsically disordered regions and are highly predicted to be involved in biomolecular condensates, indicating an interplay between RNA- and protein-based features in phase separation. This work advances our understanding of condensation-prone RNAs and provides a versatile resource to further investigate RNA-driven condensation principles.
{"title":"Integrative profiling of condensation-prone RNAs during early development.","authors":"Tajda Klobučar, Jona Novljan, Ira A Iosub, Boštjan Kokot, Iztok Urbančič, D Marc Jones, Anob M Chakrabarti, Nicholas M Luscombe, Jernej Ule, Miha Modic","doi":"10.1016/j.xgen.2025.101065","DOIUrl":"10.1016/j.xgen.2025.101065","url":null,"abstract":"<p><p>Complex RNA-protein networks play a pivotal role in the formation of many types of biomolecular condensates. How RNA features contribute to condensate formation, however, remains incompletely understood. Here, we integrate tailored transcriptomics assays to identify a distinct class of developmental condensation-prone RNAs termed \"smOOPs\" (semi-extractable, orthogonal-organic-phase-separation-enriched RNAs). These transcripts localize to larger intracellular foci, form denser RNA subnetworks than expected, and are heavily bound by RNA-binding proteins (RBPs). Using an explainable deep learning framework, we reveal that smOOPs harbor characteristic sequence composition, with lower sequence complexity, increased intramolecular folding, and specific RBP-binding patterns. Intriguingly, these RNAs encode proteins bearing extensive intrinsically disordered regions and are highly predicted to be involved in biomolecular condensates, indicating an interplay between RNA- and protein-based features in phase separation. This work advances our understanding of condensation-prone RNAs and provides a versatile resource to further investigate RNA-driven condensation principles.</p>","PeriodicalId":72539,"journal":{"name":"Cell genomics","volume":" ","pages":"101065"},"PeriodicalIF":11.1,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12903419/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145566346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-11DOI: 10.1016/j.xgen.2026.101168
Cleo L Bishop
How does senescent cell heterogeneity vary across different cell types in the liver in aging, fibrosis, and cancer? In Cell Genomics, Karpova and Li et al. reveal cell-type- and context-specific senescent cell signatures, offering the community a valuable resource and providing the potential for future therapeutic innovation.
{"title":"Liver senescence in focus: Heterogeneity across aging and cancer.","authors":"Cleo L Bishop","doi":"10.1016/j.xgen.2026.101168","DOIUrl":"10.1016/j.xgen.2026.101168","url":null,"abstract":"<p><p>How does senescent cell heterogeneity vary across different cell types in the liver in aging, fibrosis, and cancer? In Cell Genomics, Karpova and Li et al. reveal cell-type- and context-specific senescent cell signatures, offering the community a valuable resource and providing the potential for future therapeutic innovation.</p>","PeriodicalId":72539,"journal":{"name":"Cell genomics","volume":"6 2","pages":"101168"},"PeriodicalIF":11.1,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12903356/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146183818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-11Epub Date: 2025-12-01DOI: 10.1016/j.xgen.2025.101073
Sarah Chen, Aviv Regev, Anne Condon, Jiarui Ding
Single-cell RNA sequencing has provided new insights into both intracellular and intercellular processes. However, multiple processes, such as cell-type programs, differentiation, and the cell cycle, often occur simultaneously within one cell. Existing methods typically target a single process and impose restrictive assumptions, risking the loss of valuable biological information. We introduce CellUntangler, a deep generative model that embeds cells into a latent space composed of multiple subspaces, each tailored with an appropriate geometry to capture a distinct signal. Applied to datasets of cycling-only and mixed cycling/non-cycling cells, CellUntangler disentangles the cell cycle from other processes such as cell type. The framework generalizes to disentangle additional signals, including spatial, tissue dissociation, interferon response, and cell-type identity. By providing flexible embeddings to capture various signals, CellUntangler enables selective enhancement or filtering of signals at the gene-expression level, offering a powerful tool for disentangling complex biological processes in single-cell data.
{"title":"CellUntangler: Separating distinct biological signals in single-cell data with deep generative models.","authors":"Sarah Chen, Aviv Regev, Anne Condon, Jiarui Ding","doi":"10.1016/j.xgen.2025.101073","DOIUrl":"10.1016/j.xgen.2025.101073","url":null,"abstract":"<p><p>Single-cell RNA sequencing has provided new insights into both intracellular and intercellular processes. However, multiple processes, such as cell-type programs, differentiation, and the cell cycle, often occur simultaneously within one cell. Existing methods typically target a single process and impose restrictive assumptions, risking the loss of valuable biological information. We introduce CellUntangler, a deep generative model that embeds cells into a latent space composed of multiple subspaces, each tailored with an appropriate geometry to capture a distinct signal. Applied to datasets of cycling-only and mixed cycling/non-cycling cells, CellUntangler disentangles the cell cycle from other processes such as cell type. The framework generalizes to disentangle additional signals, including spatial, tissue dissociation, interferon response, and cell-type identity. By providing flexible embeddings to capture various signals, CellUntangler enables selective enhancement or filtering of signals at the gene-expression level, offering a powerful tool for disentangling complex biological processes in single-cell data.</p>","PeriodicalId":72539,"journal":{"name":"Cell genomics","volume":" ","pages":"101073"},"PeriodicalIF":11.1,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12903416/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145662908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-11Epub Date: 2026-01-26DOI: 10.1016/j.xgen.2025.101131
Madison Chapel, Jessica Dennis, Carl G de Boer
Both rare and common genetic variants contribute to human disease, and emerging evidence suggests that they combine additively to influence disease liability. However, the non-linear relationship between disease liability and disease prevalence means that risk variants may have more severe phenotypic consequences in high-risk polygenic backgrounds and minimal impact in low-risk backgrounds, resulting in uneven selection across the population. As a result, selection coefficients may be better modeled as distributions that differ across populations, time, environments, and individuals than as single values. As the number of genes contributing to a trait and epistasis between alleles increases, so does phenotypic variance, pushing more individuals to extreme phenotypes and enhancing negative selection. Because disease-relevant phenotypes may be masked in certain genetic backgrounds, we argue that the polygenic background should be considered when designing experiments to characterize the molecular underpinnings of complex traits.
{"title":"Polygenic backgrounds influence phenotypic consequences of variants in cells, individuals, and populations.","authors":"Madison Chapel, Jessica Dennis, Carl G de Boer","doi":"10.1016/j.xgen.2025.101131","DOIUrl":"10.1016/j.xgen.2025.101131","url":null,"abstract":"<p><p>Both rare and common genetic variants contribute to human disease, and emerging evidence suggests that they combine additively to influence disease liability. However, the non-linear relationship between disease liability and disease prevalence means that risk variants may have more severe phenotypic consequences in high-risk polygenic backgrounds and minimal impact in low-risk backgrounds, resulting in uneven selection across the population. As a result, selection coefficients may be better modeled as distributions that differ across populations, time, environments, and individuals than as single values. As the number of genes contributing to a trait and epistasis between alleles increases, so does phenotypic variance, pushing more individuals to extreme phenotypes and enhancing negative selection. Because disease-relevant phenotypes may be masked in certain genetic backgrounds, we argue that the polygenic background should be considered when designing experiments to characterize the molecular underpinnings of complex traits.</p>","PeriodicalId":72539,"journal":{"name":"Cell genomics","volume":" ","pages":"101131"},"PeriodicalIF":11.1,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12903375/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146069147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-11Epub Date: 2026-01-22DOI: 10.1016/j.xgen.2025.101127
Nicholas X Sloan, Jason Mares, Aidan C Daly, Shaunice Grier, Imdadul Haq, Christopher A Jackson, Natalie Barretto, Obadele Casel, Kristy Kang, Shruti Khiste, Kennedy Harris, Jacqueline Eschbach, Benjamin T Fullerton, Courteney Mattison, Brhan Gebremedhin, Joana Petrescu, Lilian Coie, Maria Hauge Pedersen, Ke Zhang, Jian Shu, Andrew F Teich, Hasini Reddy, Colin P Smith, Yousin Suh, Vilas Menon, Hemali Phatnani
We performed Visium spatial transcriptomics (ST) and single-nucleus RNA sequencing (snRNA-seq) on a cohort of nonpathological human tissues to uncover signatures of aging and senescence in the dorsolateral prefrontal cortex (dlPFC). In doing so, we identified gene expression changes characteristic of aged cortical layers. The cellular composition of the dlPFC also changed with age, with increased homeostatic astrocyte abundance and with decreased somatostatin (SST) inhibitory neurons. Nuclei from dlPFC cell types displayed a strong decline in oxidative phosphorylation- and cytoplasmic translation-related genes with age. Additionally, oligodendrocytes showed several hallmarks of senescence and a linear increase in CDKN2A expression with age. Combined analysis of ST and snRNA-seq datasets revealed astrocyte- and vascular cell-related gene expression programs in the white matter and layer 1 that were strongly enriched with age and for senescence-associated genes. These findings will help facilitate future studies exploring the role of senescent cell subpopulations in the aging brain.
{"title":"Uncovering the signatures of aging and senescence in the human dorsolateral prefrontal cortex.","authors":"Nicholas X Sloan, Jason Mares, Aidan C Daly, Shaunice Grier, Imdadul Haq, Christopher A Jackson, Natalie Barretto, Obadele Casel, Kristy Kang, Shruti Khiste, Kennedy Harris, Jacqueline Eschbach, Benjamin T Fullerton, Courteney Mattison, Brhan Gebremedhin, Joana Petrescu, Lilian Coie, Maria Hauge Pedersen, Ke Zhang, Jian Shu, Andrew F Teich, Hasini Reddy, Colin P Smith, Yousin Suh, Vilas Menon, Hemali Phatnani","doi":"10.1016/j.xgen.2025.101127","DOIUrl":"10.1016/j.xgen.2025.101127","url":null,"abstract":"<p><p>We performed Visium spatial transcriptomics (ST) and single-nucleus RNA sequencing (snRNA-seq) on a cohort of nonpathological human tissues to uncover signatures of aging and senescence in the dorsolateral prefrontal cortex (dlPFC). In doing so, we identified gene expression changes characteristic of aged cortical layers. The cellular composition of the dlPFC also changed with age, with increased homeostatic astrocyte abundance and with decreased somatostatin (SST) inhibitory neurons. Nuclei from dlPFC cell types displayed a strong decline in oxidative phosphorylation- and cytoplasmic translation-related genes with age. Additionally, oligodendrocytes showed several hallmarks of senescence and a linear increase in CDKN2A expression with age. Combined analysis of ST and snRNA-seq datasets revealed astrocyte- and vascular cell-related gene expression programs in the white matter and layer 1 that were strongly enriched with age and for senescence-associated genes. These findings will help facilitate future studies exploring the role of senescent cell subpopulations in the aging brain.</p>","PeriodicalId":72539,"journal":{"name":"Cell genomics","volume":" ","pages":"101127"},"PeriodicalIF":11.1,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12903411/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146042207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-11Epub Date: 2025-12-01DOI: 10.1016/j.xgen.2025.101076
Claudia Feng, Elin Madli Peets, Yan Zhou, Luca Crepaldi, Sunay Usluer, Alistair Dunham, Jana M Braunger, Jing Su, Magdalena E Strauss, Daniele Muraro, Kimberly Ai Xian Cheam, Marc Jan Bonder, Edgar Garriga Nogales, Sarah Cooper, Andrew Bassett, Steven Leonard, Yong Gu, Bo Fussing, David Burke, Leopold Parts, Oliver Stegle, Britta Velten
Population-scale resources of genetic, molecular, and cellular information form the basis for understanding human genomes, charting the heritable basis of disease and tracing the effects of mutations. Pooled perturbation assays, probing the effect of many perturbations coupled with single-cell RNA sequencing (scRNA-seq) readout, are especially potent references for interpreting disease-linked mutations or gene-expression changes. However, the utility of existing maps has been limited by the comprehensiveness of perturbations conducted and the relevance of their cell-line context. Here, we present a genome-scale CRISPR interference perturbation map with scRNA-seq readout across many genetic backgrounds in human pluripotent cells. We map trans expression changes induced by knockdowns and characterize their variation across donors, with expression quantitative trait loci linked to higher genetic modulation of perturbation effects. This study pioneers population-scale CRISPR perturbations with high-dimensional readouts, which will fuel the future of effective modulation of cellular disease phenotypes.
{"title":"A genome-scale single-cell CRISPRi map of trans gene regulation across human pluripotent stem cell lines.","authors":"Claudia Feng, Elin Madli Peets, Yan Zhou, Luca Crepaldi, Sunay Usluer, Alistair Dunham, Jana M Braunger, Jing Su, Magdalena E Strauss, Daniele Muraro, Kimberly Ai Xian Cheam, Marc Jan Bonder, Edgar Garriga Nogales, Sarah Cooper, Andrew Bassett, Steven Leonard, Yong Gu, Bo Fussing, David Burke, Leopold Parts, Oliver Stegle, Britta Velten","doi":"10.1016/j.xgen.2025.101076","DOIUrl":"10.1016/j.xgen.2025.101076","url":null,"abstract":"<p><p>Population-scale resources of genetic, molecular, and cellular information form the basis for understanding human genomes, charting the heritable basis of disease and tracing the effects of mutations. Pooled perturbation assays, probing the effect of many perturbations coupled with single-cell RNA sequencing (scRNA-seq) readout, are especially potent references for interpreting disease-linked mutations or gene-expression changes. However, the utility of existing maps has been limited by the comprehensiveness of perturbations conducted and the relevance of their cell-line context. Here, we present a genome-scale CRISPR interference perturbation map with scRNA-seq readout across many genetic backgrounds in human pluripotent cells. We map trans expression changes induced by knockdowns and characterize their variation across donors, with expression quantitative trait loci linked to higher genetic modulation of perturbation effects. This study pioneers population-scale CRISPR perturbations with high-dimensional readouts, which will fuel the future of effective modulation of cellular disease phenotypes.</p>","PeriodicalId":72539,"journal":{"name":"Cell genomics","volume":" ","pages":"101076"},"PeriodicalIF":11.1,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12903452/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145662977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-11Epub Date: 2025-12-02DOI: 10.1016/j.xgen.2025.101082
Ying Shao, Quang Tran, Yuan Feng, Pandurang Kolekar, Yanling Liu, Zhikai Liang, Li Fan, Andrea McBride, Tyler Jones, Alexis Cameron, Heather Mulder, Lingyun Ji, Benjamin J Huang, Jeffery M Klco, Soheil Meshinchi, Jinghui Zhang, William L Carroll, Mignon L Loh, John Easton, Patrick A Brown, Xiaotu Ma
Despite extensive studies of the error profiles of SNVs, those of insertions/deletions (indels)/structural variants (SVs) remain elusive. Using ultra-deep sequencing, we show that the error rates of indel/SVs are >100-fold lower than those of SNVs, although repeat indels have high error rates of 1%. We validated this pattern in a cohort of 103 patients with relapsed B cell acute lymphoblastic leukemia (B-ALL). We analyzed repeat indels in 339 cancer driver genes and demonstrated that the number of repeat units is highly predictive of the error rate. We then analyzed minimal residual disease samples from 72 patients with relapsed B-ALL and demonstrated that our approach had positive detections in 61% of cases, outperforming clinical flow cytometry (51% detection). Overall, we established indel and SV error profiles in deep next-generation sequencing (NGS) data, enabling superior tumor detection at very low burdens, which has a significant impact on the clinical diagnosis and monitoring of human cancers and other diseases.
{"title":"Analysis of error profiles of indels and structural variants in deep-sequencing data.","authors":"Ying Shao, Quang Tran, Yuan Feng, Pandurang Kolekar, Yanling Liu, Zhikai Liang, Li Fan, Andrea McBride, Tyler Jones, Alexis Cameron, Heather Mulder, Lingyun Ji, Benjamin J Huang, Jeffery M Klco, Soheil Meshinchi, Jinghui Zhang, William L Carroll, Mignon L Loh, John Easton, Patrick A Brown, Xiaotu Ma","doi":"10.1016/j.xgen.2025.101082","DOIUrl":"10.1016/j.xgen.2025.101082","url":null,"abstract":"<p><p>Despite extensive studies of the error profiles of SNVs, those of insertions/deletions (indels)/structural variants (SVs) remain elusive. Using ultra-deep sequencing, we show that the error rates of indel/SVs are >100-fold lower than those of SNVs, although repeat indels have high error rates of 1%. We validated this pattern in a cohort of 103 patients with relapsed B cell acute lymphoblastic leukemia (B-ALL). We analyzed repeat indels in 339 cancer driver genes and demonstrated that the number of repeat units is highly predictive of the error rate. We then analyzed minimal residual disease samples from 72 patients with relapsed B-ALL and demonstrated that our approach had positive detections in 61% of cases, outperforming clinical flow cytometry (51% detection). Overall, we established indel and SV error profiles in deep next-generation sequencing (NGS) data, enabling superior tumor detection at very low burdens, which has a significant impact on the clinical diagnosis and monitoring of human cancers and other diseases.</p>","PeriodicalId":72539,"journal":{"name":"Cell genomics","volume":" ","pages":"101082"},"PeriodicalIF":11.1,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12903382/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145672808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-11Epub Date: 2025-11-19DOI: 10.1016/j.xgen.2025.101068
Seon-Kyeong Jang, Zitian Wang, Richard Border, Dinh Tuan, Angela Wei, Ulzee An, Sriram Sankararaman, Vasilis Ntranos, Jonathan Flint, Noah Zaitlen
Protein language models (PLMs) improve variant effect predictions, but their role in gene discovery for complex traits remains unclear. We introduce an allelic series-based regression test that uses PLM-derived variant effect predictions as proxies for effect sizes, identifying ∼46% more associations than standard burden tests. Extending this to isoform-level analysis, we find 26 gene-trait pairs with stronger associations in non-canonical versus canonical transcripts, highlighting isoform-specific effects. Finally, we identify evolutionary plausible variants (EPVs), missense variants assigned higher likelihoods than the wild-type alleles by PLMs, representing 0.45% of missense variants. EPVs show higher allele frequencies than synonymous variants, consistent with differential selection pressures, and are linked to nine traits, including protective associations with low-density lipoprotein (LDL) and bone mineral density. Together, our results demonstrate how PLMs can enhance rare-variant interpretation and gene-trait association discovery in exome data.
{"title":"Leveraging protein language models to identify complex trait associations with previously inaccessible classes of functional rare variants.","authors":"Seon-Kyeong Jang, Zitian Wang, Richard Border, Dinh Tuan, Angela Wei, Ulzee An, Sriram Sankararaman, Vasilis Ntranos, Jonathan Flint, Noah Zaitlen","doi":"10.1016/j.xgen.2025.101068","DOIUrl":"10.1016/j.xgen.2025.101068","url":null,"abstract":"<p><p>Protein language models (PLMs) improve variant effect predictions, but their role in gene discovery for complex traits remains unclear. We introduce an allelic series-based regression test that uses PLM-derived variant effect predictions as proxies for effect sizes, identifying ∼46% more associations than standard burden tests. Extending this to isoform-level analysis, we find 26 gene-trait pairs with stronger associations in non-canonical versus canonical transcripts, highlighting isoform-specific effects. Finally, we identify evolutionary plausible variants (EPVs), missense variants assigned higher likelihoods than the wild-type alleles by PLMs, representing 0.45% of missense variants. EPVs show higher allele frequencies than synonymous variants, consistent with differential selection pressures, and are linked to nine traits, including protective associations with low-density lipoprotein (LDL) and bone mineral density. Together, our results demonstrate how PLMs can enhance rare-variant interpretation and gene-trait association discovery in exome data.</p>","PeriodicalId":72539,"journal":{"name":"Cell genomics","volume":" ","pages":"101068"},"PeriodicalIF":11.1,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12903449/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145566307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}