首页 > 最新文献

Genome research最新文献

英文 中文
Gene duplication is associated with gene diversification and potential neofunctionalization in lung cancer evolution. 在肺癌的进化过程中,基因复制与基因多样化和潜在的新功能化有关。
IF 5.5 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2026-03-02 DOI: 10.1101/gr.278663.123
Paul Ashford, Alexander M Frankell, Zofia Piszka, Camilla S M Pang, Mahnaz Abbasian, Maise Al Bakir, Mariam Jamal-Hanjani, Nicholas McGranahan, Charles Swanton, Christine A Orengo

Tumors evolve through a process of selection on somatic mutations, driving cell division and tissue growth through aberrations in cell-cycle control. In non-small-cell lung cancer (NSCLC), genome instability occurs early in tumor growth, resulting in pronounced intratumor heterogeneity, including changes in gene copy number, and whole-genome doubling (WGD) in ∼75% of tumors. Gene duplication, genetic drift, and selection mediate functional diversification during evolution. In this study, we seek to identify the diversification and potential gene neofunctionalization of lung tumors in the TRACERx cohort. We develop a novel computational protocol to identify preduplication and postduplication mutations predicted to affect protein function. Mutations are analyzed using paralogs grouped into functional families with highly similar functions, identifying 355 functional impact events (FIEs) through their proximity and clustering near to functional sites. The use of functional family paralogs to map mutations to protein structures from the PDB helps predict putative rare driver events in lung tumors. By extending the analysis with high-quality structural models from AlphaFold using The Encyclopedia of Domains (TED), we find a significant increase in the diversity of both genes and functional families with postduplication FIEs in lung adenocarcinomas, including some metabolic enzymes with the potential to be neofunctional. The postduplication diversification of driver genes and functions may indicate selection for somatic copy number changes in lung tumors and an increased scope for tumor adaptations.

肿瘤的进化是通过对体细胞突变的选择,通过细胞周期控制的畸变驱动细胞分裂和组织生长。在非小细胞肺癌(NSCLC)中,基因组不稳定性发生在肿瘤生长的早期,导致肿瘤内明显的异质性,包括基因拷贝数的变化,以及约75%的肿瘤的全基因组加倍(WGD)。基因复制、遗传漂变和选择介导了进化过程中的功能多样化。在这项研究中,我们试图在TRACERx队列中确定肺肿瘤的多样化和潜在的基因新功能化。我们开发了一种新的计算协议,以确定预测影响蛋白质功能的复制前和复制后突变。突变分析使用类似性分组到具有高度相似功能的功能家族中,通过它们在功能位点附近的接近性和聚类确定了355个功能影响事件(FIEs)。使用功能家族相似性将突变映射到PDB的蛋白质结构有助于预测肺肿瘤中假定的罕见驱动事件。通过使用来自AlphaFold的高质量结构模型,使用the Encyclopedia of Domains (TED)进行分析,我们发现肺腺癌中复制后FIEs的基因和功能家族的多样性显著增加,包括一些具有新功能潜力的代谢酶。驱动基因和功能的复制后多样化可能表明肺肿瘤中体细胞拷贝数变化的选择和肿瘤适应范围的增加。
{"title":"Gene duplication is associated with gene diversification and potential neofunctionalization in lung cancer evolution.","authors":"Paul Ashford, Alexander M Frankell, Zofia Piszka, Camilla S M Pang, Mahnaz Abbasian, Maise Al Bakir, Mariam Jamal-Hanjani, Nicholas McGranahan, Charles Swanton, Christine A Orengo","doi":"10.1101/gr.278663.123","DOIUrl":"10.1101/gr.278663.123","url":null,"abstract":"<p><p>Tumors evolve through a process of selection on somatic mutations, driving cell division and tissue growth through aberrations in cell-cycle control. In non-small-cell lung cancer (NSCLC), genome instability occurs early in tumor growth, resulting in pronounced intratumor heterogeneity, including changes in gene copy number, and whole-genome doubling (WGD) in ∼75% of tumors. Gene duplication, genetic drift, and selection mediate functional diversification during evolution. In this study, we seek to identify the diversification and potential gene neofunctionalization of lung tumors in the TRACERx cohort. We develop a novel computational protocol to identify preduplication and postduplication mutations predicted to affect protein function. Mutations are analyzed using paralogs grouped into functional families with highly similar functions, identifying 355 functional impact events (FIEs) through their proximity and clustering near to functional sites. The use of functional family paralogs to map mutations to protein structures from the PDB helps predict putative rare driver events in lung tumors. By extending the analysis with high-quality structural models from AlphaFold using The Encyclopedia of Domains (TED), we find a significant increase in the diversity of both genes and functional families with postduplication FIEs in lung adenocarcinomas, including some metabolic enzymes with the potential to be neofunctional. The postduplication diversification of driver genes and functions may indicate selection for somatic copy number changes in lung tumors and an increased scope for tumor adaptations.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"561-577"},"PeriodicalIF":5.5,"publicationDate":"2026-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12951968/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146226481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Degrees of convergent evolution in rodent adaptations to arid environments. 啮齿动物适应干旱环境的趋同进化程度。
IF 5.5 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2026-03-02 DOI: 10.1101/gr.280089.124
Domitille Chalopin, Carine Rey, Jeremy Ganofsky, Juliana Blin, Pascale Chevret, Marion Mouginot, Laurent Guéguen, Bastien Boussau, Sophie Pantalacci, Marie Sémon

Species adapting to a similar lifestyle may undergo convergent changes in organ structure and cellular function, themselves relying or not on these convergent genetic changes. The extent of genomic convergence is thus debated and may further depend on the interplay between temporal factors, such as species relatedness or the age of the transition. Rodents have repeatedly adapted to life in arid conditions, notably with altered renal morphology and physiology. By analyzing kidney transcriptomes from 33 species, we find convergence at all examined biological levels, from the whole kidney transcriptome down to the coding sequences and expression level of individual genes. Transcriptome-level signatures reflect convergent changes in cell proportions, suggesting convergent structural adaptations of the kidney. A large proportion of genes shows convergent substitutions, but those happened in small subsets of species, showing that there are multiple genetic paths repeatedly taken in a mosaic manner. A similar mosaic signal of convergence is found comparing gene expression in species spanning the Rodentia order, but convergence is more widely shared at the lower level of the Murinae family. Therefore, we test more directly the influence of temporal factors. We observe more convergent changes when we select species independently adapted from more closely than more distantly related ancestors and when we select older transitions rather than recent transitions. Our study shows that there are many different, yet repeatedly selected, ways to adapt to aridity and that the degree of convergent evolution increases with both the age of the transitions and species relatedness.

适应相似生活方式的物种可能会经历器官结构和细胞功能的趋同变化,它们本身依赖于或不依赖于趋同的遗传变化。因此,基因组趋同的程度存在争议,可能进一步取决于时间因素之间的相互作用,例如物种亲缘关系或过渡的年龄。啮齿类动物反复适应干旱条件下的生活,尤其是肾脏形态和生理上的改变。通过分析来自33个物种的肾脏转录组,我们发现从整个肾脏转录组到单个基因的编码序列和表达水平,在所有检测的生物学水平上都存在收敛性。转录组水平的特征反映了细胞比例的趋同变化,表明肾脏的趋同结构适应。大部分基因表现出趋同替代,但这种情况只发生在一小部分物种中,这表明有多种遗传路径以镶嵌的方式重复出现。在啮齿类目物种的基因表达比较中发现了类似的趋同的马赛克信号,但趋同在较低水平的Murinae科中更为普遍。因此,我们更直接地检验了时间因素的影响。当我们选择与亲缘关系较近的祖先独立适应的物种时,当我们选择较早的过渡而不是最近的过渡时,我们观察到更多的趋同变化。我们的研究表明,有许多不同的,但反复选择的方式来适应干旱,并且趋同进化的程度随着过渡的年龄和物种亲缘关系而增加。
{"title":"Degrees of convergent evolution in rodent adaptations to arid environments.","authors":"Domitille Chalopin, Carine Rey, Jeremy Ganofsky, Juliana Blin, Pascale Chevret, Marion Mouginot, Laurent Guéguen, Bastien Boussau, Sophie Pantalacci, Marie Sémon","doi":"10.1101/gr.280089.124","DOIUrl":"10.1101/gr.280089.124","url":null,"abstract":"<p><p>Species adapting to a similar lifestyle may undergo convergent changes in organ structure and cellular function, themselves relying or not on these convergent genetic changes. The extent of genomic convergence is thus debated and may further depend on the interplay between temporal factors, such as species relatedness or the age of the transition. Rodents have repeatedly adapted to life in arid conditions, notably with altered renal morphology and physiology. By analyzing kidney transcriptomes from 33 species, we find convergence at all examined biological levels, from the whole kidney transcriptome down to the coding sequences and expression level of individual genes. Transcriptome-level signatures reflect convergent changes in cell proportions, suggesting convergent structural adaptations of the kidney. A large proportion of genes shows convergent substitutions, but those happened in small subsets of species, showing that there are multiple genetic paths repeatedly taken in a mosaic manner. A similar mosaic signal of convergence is found comparing gene expression in species spanning the <i>Rodentia</i> order, but convergence is more widely shared at the lower level of the <i>Murinae</i> family. Therefore, we test more directly the influence of temporal factors. We observe more convergent changes when we select species independently adapted from more closely than more distantly related ancestors and when we select older transitions rather than recent transitions. Our study shows that there are many different, yet repeatedly selected, ways to adapt to aridity and that the degree of convergent evolution increases with both the age of the transitions and species relatedness.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"472-486"},"PeriodicalIF":5.5,"publicationDate":"2026-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12951959/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146018238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Early feature extraction drives model performance in high-resolution chromatin accessibility prediction. 早期特征提取驱动模型在高分辨率染色质可及性预测中的性能。
IF 5.5 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2026-03-02 DOI: 10.1101/gr.281042.125
Aayush Grover, Till Muser, Liine Kasak, Lin Zhang, Ekaterina Krymova, Valentina Boeva

Fine-grained prediction of chromatin accessibility from DNA sequence is a foundational step in modeling gene expression changes resulting from sequence variants. Yet, few methods operate at the resolution necessary to capture subtle effects of single-nucleotide changes. Furthermore, it remains unclear which architectural components, such as residual connections, normalization strategies, or attention mechanisms, drive performance in these high-resolution predictions. To address these knowledge gaps, we systematically evaluate classic architectural choices and introduce ConvNeXt V2 blocks, originally developed for computer vision, as high-resolution feature extractors in deep learning models for genomic data. Integrated into diverse architectures such as convoluted neural networks (CNNs), long short-term memory (LSTM), dilated CNNs, and transformers, ConvNeXt V2 blocks consistently improve performance, leading to similar prediction accuracy across these different model types. This reveals that early feature extraction, rather than downstream architecture, is the primary determinant of prediction accuracy. A comprehensive evaluation of these models on ATAC-seq signal prediction at 4-bp resolution in a cell type-specific manner identifies the ConvNeXt-based dilated CNN as the most robust performer, better preserving the signal's shape. Our codebase and benchmarks provide practical tools for high-resolution chromatin modeling.

从DNA序列中精细预测染色质可及性是建立由序列变异引起的基因表达变化模型的基础步骤。然而,很少有方法能达到捕捉单核苷酸变化的细微影响所必需的分辨率。此外,还不清楚哪些架构组件(如剩余连接、规范化策略或注意机制)驱动这些高分辨率预测的性能。为了解决这些知识差距,我们系统地评估了经典的架构选择,并引入了最初为计算机视觉开发的ConvNeXt V2块,作为基因组数据深度学习模型中的高分辨率特征提取器。集成到不同的体系结构中,如cnn、lstm、扩展cnn和变压器,ConvNeXt V2块不断提高性能,从而在这些不同的模型类型中实现相似的预测精度。这表明,早期特征提取,而不是下游架构,是预测精度的主要决定因素。对这些模型以细胞类型特异性的方式在4bp分辨率下预测ATAC-seq信号的综合评估表明,基于convnext的扩展CNN是最稳健的表现,更好地保留了信号的形状。我们的代码库和基准测试为高分辨率染色质建模提供了实用的工具。
{"title":"Early feature extraction drives model performance in high-resolution chromatin accessibility prediction.","authors":"Aayush Grover, Till Muser, Liine Kasak, Lin Zhang, Ekaterina Krymova, Valentina Boeva","doi":"10.1101/gr.281042.125","DOIUrl":"10.1101/gr.281042.125","url":null,"abstract":"<p><p>Fine-grained prediction of chromatin accessibility from DNA sequence is a foundational step in modeling gene expression changes resulting from sequence variants. Yet, few methods operate at the resolution necessary to capture subtle effects of single-nucleotide changes. Furthermore, it remains unclear which architectural components, such as residual connections, normalization strategies, or attention mechanisms, drive performance in these high-resolution predictions. To address these knowledge gaps, we systematically evaluate classic architectural choices and introduce ConvNeXt V2 blocks, originally developed for computer vision, as high-resolution feature extractors in deep learning models for genomic data. Integrated into diverse architectures such as convoluted neural networks (CNNs), long short-term memory (LSTM), dilated CNNs, and transformers, ConvNeXt V2 blocks consistently improve performance, leading to similar prediction accuracy across these different model types. This reveals that early feature extraction, rather than downstream architecture, is the primary determinant of prediction accuracy. A comprehensive evaluation of these models on ATAC-seq signal prediction at 4-bp resolution in a cell type-specific manner identifies the ConvNeXt-based dilated CNN as the most robust performer, better preserving the signal's shape. Our codebase and benchmarks provide practical tools for high-resolution chromatin modeling.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"619-629"},"PeriodicalIF":5.5,"publicationDate":"2026-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12951969/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145959206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transcription and potential functions of a novel XIST isoform in male peripheral glia. 一种新的XIST亚型在男性外周胶质细胞中的转录和潜在功能。
IF 5.5 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2026-02-03 DOI: 10.1101/gr.280832.125
Kevin S O'Leary, Meng-Yen Li, Kevyn Jackson, Lijie Shi, Elena Ezhkova, Bernice E Morrow, Deyou Zheng

The XIST RNA is known for its critical roles in X Chromosome inactivation (XCI). It is thought to be expressed exclusively from one copy of the X Chromosome and silence it by recruiting various chromatin factors in female cells. In this study, we find XIST expression in male peripheral glia after integrated analyses of single-cell RNA-seq data from multiple human tissues and organs. Single-cell epigenomic data further indicate that the expression is likely driven by an alternative promoter at the end of the first exon, resulting in at least one shorter transcript (referred to as sXIST) that is active in Schwann cells and, moreover, at a higher level in nonmyelinating Schwann cells. This promoter exhibits similar activity in female glia. Multiple lines of evidence from bulk transcriptomic and epigenomic data from peripheral nerve tissues further support these findings. Genes coexpressed positively and strongly with sXIST in male glia show functional enrichment in axon assembly and cilia signaling, with many of them sharing putative miRNA binding sites with sXIST, whereas the negatively correlated genes are enriched for processes important for neuromuscular junctions. This suggests possible functions of sXIST in modulating glia-neuron interactions, perhaps via competitive miRNA binding. This idea is also supported by overexpression analysis of a partial sXIST sequence and the finding of significant XIST expression changes in human cardiomyopathy and polyneuropathy. In summary, the current study suggests a novel, non-XCI role of XIST in peripheral Schwann cells that is mediated by a newly recognized transcript.

XIST RNA因其在X染色体失活(XCI)中的关键作用而闻名。它被认为只从X染色体的一个拷贝中表达,并通过在女性细胞中招募各种染色质因子来沉默它。在这项研究中,我们通过对来自多个人体组织和器官的单细胞RNA-seq数据进行综合分析,发现XIST在男性外周胶质细胞中表达。单细胞表观基因组数据进一步表明,表达可能是由第一个外显子末端的替代启动子驱动的,导致至少一个较短的转录本(称为sXIST)在雪旺细胞中活跃,而且在非髓鞘雪旺细胞中活性更高。该启动子在雌性胶质细胞中表现出类似的活性。来自周围神经组织的大量转录组学和表观基因组学数据的多条证据进一步支持了这些发现。在雄性胶质细胞中,与sXIST共表达阳性和强表达的基因在轴突组装和纤毛信号传导中表现出功能富集,其中许多基因与sXIST共享假定的miRNA结合位点,而负相关基因则在神经肌肉连接的重要过程中富集。这表明sXIST可能通过竞争性miRNA结合来调节胶质细胞与神经元的相互作用。这一观点也得到了部分sXIST序列过表达分析的支持,并发现在人类心肌病和多发性神经病中存在显著的XIST表达变化。总之,目前的研究表明,XIST在外周雪旺细胞中具有一种新的非xci作用,这种作用是由一种新识别的转录物介导的。
{"title":"Transcription and potential functions of a novel <i>XIST</i> isoform in male peripheral glia.","authors":"Kevin S O'Leary, Meng-Yen Li, Kevyn Jackson, Lijie Shi, Elena Ezhkova, Bernice E Morrow, Deyou Zheng","doi":"10.1101/gr.280832.125","DOIUrl":"10.1101/gr.280832.125","url":null,"abstract":"<p><p>The <i>XIST</i> RNA is known for its critical roles in X Chromosome inactivation (XCI). It is thought to be expressed exclusively from one copy of the X Chromosome and silence it by recruiting various chromatin factors in female cells. In this study, we find <i>XIST</i> expression in male peripheral glia after integrated analyses of single-cell RNA-seq data from multiple human tissues and organs. Single-cell epigenomic data further indicate that the expression is likely driven by an alternative promoter at the end of the first exon, resulting in at least one shorter transcript (referred to as <i>sXIST</i>) that is active in Schwann cells and, moreover, at a higher level in nonmyelinating Schwann cells. This promoter exhibits similar activity in female glia. Multiple lines of evidence from bulk transcriptomic and epigenomic data from peripheral nerve tissues further support these findings. Genes coexpressed positively and strongly with <i>sXIST</i> in male glia show functional enrichment in axon assembly and cilia signaling, with many of them sharing putative miRNA binding sites with <i>sXIST</i>, whereas the negatively correlated genes are enriched for processes important for neuromuscular junctions. This suggests possible functions of <i>sXIST</i> in modulating glia-neuron interactions, perhaps via competitive miRNA binding. This idea is also supported by overexpression analysis of a partial <i>sXIST</i> sequence and the finding of significant <i>XIST</i> expression changes in human cardiomyopathy and polyneuropathy. In summary, the current study suggests a novel, non-XCI role of <i>XIST</i> in peripheral Schwann cells that is mediated by a newly recognized transcript.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"257-274"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863056/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145742207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transcriptomic landscape of transposable elements reveals LTR7-PLAAT4 as a potential oncogene and therapeutic target in pancreatic adenocarcinoma. 转座因子的转录组学景观揭示LTR7-PLAAT4是胰腺癌的潜在致癌基因和治疗靶点。
IF 5.5 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2026-02-03 DOI: 10.1101/gr.280528.125
Meilong Shi, Chuanqi Teng, Shan Zhang, Xiaobo He, Lingyun Xu, Fengxian Han, Rongqi Wen, Ganjun Yu, Jingwen Liu, Yang Feng, Yanfeng Wu, Yan Ren, Gang Jin, Jing Li

Eukaryotic genomes contain numerous transposable elements (TEs), whose dysregulation threatens genome stability and may contribute to cancer. Pancreatic adenocarcinoma (PAAD) is among the deadliest cancers, marked by abundant stroma that obscures tumor-specific molecular signals, complicating bulk-tissue analyses. Here, using 71 patient-derived PAAD organoids, we show that TE activities may potentially promote tumorigenesis and provide a source of novel immunotherapeutic targets. We identify 16 new TE-derived transcripts fused with 15 known oncogenes, exhibiting potential oncogenic function and prognostic value. Notably, LTR7-PLAAT4, present in 29% of tumors, encodes a protein variant transcriptionally regulated by FOXM1 binding to the LTR7 promoter. LTR7-PLAAT4 isoform 2 is associated with increased cholesterol ester accumulation and lipid droplet formation mediated through BSCL2 coexpression, potentially fostering tumor progression. On the immunogenic front, HLA-I immunopeptidomics of AsPC-1 cells and DAC13 organoids identify over 11,000 peptides respectively. Althought mutation-derived neoantigens are rare, several peptides are originated from TE-chimeric transcripts, including four predicted by TEprof2. The peptide FLIQHLPLV, detected in 27% of organoids, exhibits robust immunogenicity, validated by T2 binding, mass spectrometry and ELISPOT assays with HLA-genotyped PBMCs. Together, these findings suggest that TE activities may contribute to PAAD progression and diversify its immunopeptidome, providing new opportunities for molecular subtyping and potential immunotherapeutic intervention.

真核生物基因组含有大量的转座因子(te),其失调威胁着基因组的稳定性并可能导致癌症。胰腺腺癌(PAAD)是最致命的癌症之一,其特征是丰富的间质掩盖了肿瘤特异性分子信号,使大组织分析复杂化。在这里,使用71例患者来源的PAAD类器官,我们表明TE活性可能潜在地促进肿瘤发生,并提供新的免疫治疗靶点来源。我们鉴定了16个新的te衍生转录物与15个已知的癌基因融合,显示出潜在的致癌功能和预后价值。值得注意的是,LTR7- plaat4存在于29%的肿瘤中,它编码一种由FOXM1结合LTR7启动子转录调节的蛋白质变体。LTR7-PLAAT4亚型2与通过BSCL2共表达介导的胆固醇酯积累和脂滴形成增加有关,可能促进肿瘤进展。在免疫原性方面,AsPC-1细胞和DAC13类器官的hla -1免疫肽组学分别鉴定了超过11,000个肽。虽然突变衍生的新抗原很少见,但有几种肽起源于te嵌合转录物,包括TEprof2预测的四种。在27%的类器官中检测到FLIQHLPLV肽,通过T2结合、质谱和ELISPOT对hla基因型PBMCs的检测证实,FLIQHLPLV肽具有强大的免疫原性。总之,这些发现表明TE活动可能有助于PAAD的进展,并使其免疫肽多肽多样化,为分子分型和潜在的免疫治疗干预提供了新的机会。
{"title":"Transcriptomic landscape of transposable elements reveals <i>LTR7</i>-<i>PLAAT4</i> as a potential oncogene and therapeutic target in pancreatic adenocarcinoma.","authors":"Meilong Shi, Chuanqi Teng, Shan Zhang, Xiaobo He, Lingyun Xu, Fengxian Han, Rongqi Wen, Ganjun Yu, Jingwen Liu, Yang Feng, Yanfeng Wu, Yan Ren, Gang Jin, Jing Li","doi":"10.1101/gr.280528.125","DOIUrl":"10.1101/gr.280528.125","url":null,"abstract":"<p><p>Eukaryotic genomes contain numerous transposable elements (TEs), whose dysregulation threatens genome stability and may contribute to cancer. Pancreatic adenocarcinoma (PAAD) is among the deadliest cancers, marked by abundant stroma that obscures tumor-specific molecular signals, complicating bulk-tissue analyses. Here, using 71 patient-derived PAAD organoids, we show that TE activities may potentially promote tumorigenesis and provide a source of novel immunotherapeutic targets. We identify 16 new TE-derived transcripts fused with 15 known oncogenes, exhibiting potential oncogenic function and prognostic value. Notably, <i>LTR7</i>-<i>PLAAT4</i>, present in 29% of tumors, encodes a protein variant transcriptionally regulated by <i>FOXM1</i> binding to the <i>LTR7</i> promoter. <i>LTR7</i>-<i>PLAAT4</i> isoform 2 is associated with increased cholesterol ester accumulation and lipid droplet formation mediated through <i>BSCL2</i> coexpression, potentially fostering tumor progression. On the immunogenic front, HLA-I immunopeptidomics of AsPC-1 cells and DAC13 organoids identify over 11,000 peptides respectively. Althought mutation-derived neoantigens are rare, several peptides are originated from TE-chimeric transcripts, including four predicted by TEprof2. The peptide FLIQHLPLV, detected in 27% of organoids, exhibits robust immunogenicity, validated by T2 binding, mass spectrometry and ELISPOT assays with HLA-genotyped PBMCs. Together, these findings suggest that TE activities may contribute to PAAD progression and diversify its immunopeptidome, providing new opportunities for molecular subtyping and potential immunotherapeutic intervention.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"275-290"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863058/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145932905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The SynMall resource for characterizing the functional impact of synonymous variation. 表征同义变化的功能影响的SynMall资源。
IF 5.5 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2026-02-03 DOI: 10.1101/gr.281257.125
Chen Ye, Xiaoyan Li, Na Cheng, Yansen Su, Junfeng Xia

Synonymous single-nucleotide variants (sSNVs) are increasingly recognized as contributors to disease, yet existing variant annotation databases offer limited functional insights for sSNVs. Here, we present SynMall, a comprehensive resource designed to decipher the functional impact of synonymous variation. SynMall catalogs 25 million potential human sSNVs and integrates evolutionary and population information of sSNVs from 45 non-human species. For each human sSNV, SynMall provides multilevel annotations that combine American College of Medical Genetics and Genomics (ACMG)-aligned variant interpretation information, such as allele frequencies and functional effects, with more than 100 descriptors at the DNA, RNA, and protein levels. These include both handcrafted features and embeddings from large language models to support advanced representation learning. To prioritize pathogenic sSNVs, we have developed SynScore, a machine learning framework that integrates ACMG guidelines and diverse biological characteristics. Benchmark comparisons show that SynScore achieves state-of-the-art performance, validating its effectiveness for genome-wide pathogenicity inference. Furthermore, SynMall enables mechanistic exploration by investigating in silico assessments and curated literature evidence to evaluate sSNV effects on miRNA-mRNA interactions, mRNA splicing, mRNA stability, and codon usage. By consolidating these features into a unified platform, we anticipate that SynMall will serve as a valuable resource for elucidating the functional role of synonymous mutations.

同义单核苷酸变异(ssnv)越来越被认为是疾病的贡献者,然而现有的变异注释数据库对ssnv的功能见解有限。我们提出SynMall,一个综合性的资源,旨在破译同义变化的功能影响。SynMall收录了2500万个潜在的人类ssnv,并整合了45个非人类物种的ssnv进化和种群信息。对于每个人类sSNV, SynMall提供了多层注释,这些注释结合了美国医学遗传学和基因组学学院(ACMG)的变异解释信息,如等位基因频率和功能效应,以及100多个DNA、RNA和蛋白质水平的描述符。这包括手工制作的特征和来自大型语言模型的嵌入,以支持高级表示学习。为了优先考虑致病性ssnv,我们开发了SynScore,这是一个整合ACMG指南和多种生物学特性的机器学习框架。基准比较表明,SynScore达到了最先进的性能,验证了其在全基因组致病性推断方面的有效性。此外,SynMall通过研究计算机评估和整理的文献证据来评估sSNV对miRNA-mRNA相互作用、mRNA剪接、mRNA稳定性和密码子使用的影响,从而进行机制探索。通过将这些功能整合到一个统一的平台中,我们预计SynMall将成为阐明同义突变功能角色的宝贵资源。
{"title":"The SynMall resource for characterizing the functional impact of synonymous variation.","authors":"Chen Ye, Xiaoyan Li, Na Cheng, Yansen Su, Junfeng Xia","doi":"10.1101/gr.281257.125","DOIUrl":"10.1101/gr.281257.125","url":null,"abstract":"<p><p>Synonymous single-nucleotide variants (sSNVs) are increasingly recognized as contributors to disease, yet existing variant annotation databases offer limited functional insights for sSNVs. Here, we present SynMall, a comprehensive resource designed to decipher the functional impact of synonymous variation. SynMall catalogs 25 million potential human sSNVs and integrates evolutionary and population information of sSNVs from 45 non-human species. For each human sSNV, SynMall provides multilevel annotations that combine American College of Medical Genetics and Genomics (ACMG)-aligned variant interpretation information, such as allele frequencies and functional effects, with more than 100 descriptors at the DNA, RNA, and protein levels. These include both handcrafted features and embeddings from large language models to support advanced representation learning. To prioritize pathogenic sSNVs, we have developed SynScore, a machine learning framework that integrates ACMG guidelines and diverse biological characteristics. Benchmark comparisons show that SynScore achieves state-of-the-art performance, validating its effectiveness for genome-wide pathogenicity inference. Furthermore, SynMall enables mechanistic exploration by investigating in silico assessments and curated literature evidence to evaluate sSNV effects on miRNA-mRNA interactions, mRNA splicing, mRNA stability, and codon usage. By consolidating these features into a unified platform, we anticipate that SynMall will serve as a valuable resource for elucidating the functional role of synonymous mutations.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"421-431"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863188/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145774461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantifying pathological progression from single-cell transcriptomic data with scPSS. 用scPSS从单细胞转录组学数据量化病理进展。
IF 5.5 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2026-02-03 DOI: 10.1101/gr.280411.125
Samin Rahman Khan, M Saifur Rahman, M Sohel Rahman, Md Abul Hassan Samee

The surge in single-cell data sets and reference atlases has enabled the comparison of cell states across conditions, yet a gap persists in quantifying pathological shifts from healthy cell states. To address this gap, we introduce single-cell Pathological Shift Scoring (scPSS), which provides a statistical measure for how much a "query" cell from a diseased sample has shifted away from a reference group of healthy cells. In scPSS, the distance of a cell to its k-th nearest reference cell is considered as its pathological shift score. Euclidean distances in the top n principal component space of the gene expressions are used to measure distances between cells. The distribution of shift scores of the reference cells forms a null model. This allows a P-value to be assigned to each query cell's shift score, quantifying its statistical significance of being in the reference cell group. This makes our method both simple and statistically rigorous. The key strength of scPSS is its applicability in a "semisupervised" setting, where only healthy reference cells are known and diseased-labeled data are not provided for model training. As existing methods do not support cell-level pathological progression measurement in this setting, we adapt state-of-the-art supervised pathological prediction and contrastive models for benchmarking. Comparative evaluations against these adapted models demonstrate our method's superiority in accuracy and efficiency. Additionally, we show that the aggregation of cell-level pathological scores from scPSS can be used to predict health conditions at the individual level.

单细胞数据集和参考图谱的激增使不同条件下细胞状态的比较成为可能,然而,在定量健康细胞状态的病理转变方面仍然存在差距。为了解决这一差距,我们引入了单细胞病理移位评分(scPSS),它提供了一种统计方法,用于测量来自患病样本的“查询”细胞从参考组健康细胞中移位的程度。在scPSS中,细胞到第k个最近的参考细胞的距离被认为是其病理移位评分。在基因表达的前n主成分空间中的欧几里得距离被用来测量细胞之间的距离。参考单元格的移位分数分布形成一个零模型。这允许将p值分配给每个查询单元的移位分数,量化其在参考单元组中的统计显著性。这使得我们的方法既简单又统计严谨。scPSS的关键优势在于它在“半监督”环境中的适用性,在这种环境中,只有健康的参考细胞是已知的,而疾病标记的数据不提供给模型训练。由于现有方法不支持在这种情况下细胞水平的病理进展测量,我们采用最先进的监督病理预测和对比模型进行基准测试。通过与这些模型的比较,证明了该方法在精度和效率上的优越性。此外,我们表明scPSS细胞水平病理评分的聚合可以用于预测个体水平的健康状况。
{"title":"Quantifying pathological progression from single-cell transcriptomic data with scPSS.","authors":"Samin Rahman Khan, M Saifur Rahman, M Sohel Rahman, Md Abul Hassan Samee","doi":"10.1101/gr.280411.125","DOIUrl":"10.1101/gr.280411.125","url":null,"abstract":"<p><p>The surge in single-cell data sets and reference atlases has enabled the comparison of cell states across conditions, yet a gap persists in quantifying pathological shifts from healthy cell states. To address this gap, we introduce <u>s</u>ingle-<u>c</u>ell <u>P</u>athological <u>S</u>hift <u>S</u>coring (scPSS), which provides a statistical measure for how much a \"query\" cell from a diseased sample has shifted away from a reference group of healthy cells. In scPSS, the distance of a cell to its <i>k</i>-th nearest reference cell is considered as its pathological shift score. Euclidean distances in the top <i>n</i> principal component space of the gene expressions are used to measure distances between cells. The distribution of shift scores of the reference cells forms a null model. This allows a <i>P</i>-value to be assigned to each query cell's shift score, quantifying its statistical significance of being in the reference cell group. This makes our method both simple and statistically rigorous. The key strength of scPSS is its applicability in a \"semisupervised\" setting, where only healthy reference cells are known and diseased-labeled data are not provided for model training. As existing methods do not support cell-level pathological progression measurement in this setting, we adapt state-of-the-art supervised pathological prediction and contrastive models for benchmarking. Comparative evaluations against these adapted models demonstrate our method's superiority in accuracy and efficiency. Additionally, we show that the aggregation of cell-level pathological scores from scPSS can be used to predict health conditions at the individual level.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"375-386"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863187/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145984736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The oligogenic inheritance test GCOD detects risk genes and their interactions in congenital heart defects. 低基因遗传测试GCOD检测先天性心脏缺陷的风险基因及其相互作用。
IF 5.5 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2026-02-03 DOI: 10.1101/gr.281141.125
Maureen Pittman, Kihyun Lee, Franco Felix, Yu Huang, Adrienne Lam, Mauro W Costa, Deepak Srivastava, Katherine S Pollard

Exome sequencing of thousands of families has revealed many risk genes for congenital heart defects (CHDs), yet most cases cannot be explained by a single causal mutation. Even within the same family, individuals carrying a particular mutation in a known risk gene often demonstrate variable phenotypes, suggesting the presence of genetic modifiers. To explore oligogenic causes of CHD without assessing billions of variant combinations, we develop an efficient, simulation-based method to detect gene sets that carry co-occurring damaging variants in probands at a higher rate than expected given parental genotypes. We implement this approach in software called Gene Combinations in Oligogenic Disease (GCOD) and apply it to a cohort of 3377 CHD trios with exome sequencing. This analysis detects 160 gene pairs in which damaging variants are transmitted with higher-than-expected frequency to CHD probands but rarely or never appear in combination in their unaffected parents. Stratifying by specific phenotypes and considering gene combinations of higher orders yields an additional 6026 gene sets. Genes found in oligogenic sets are overrepresented in pathways related to heart development and often co-occur in sets of cell type marker genes from single-cell expression data. Compound heterozygosity of the newly identified digenic pair Gata6-Por leads to higher CHD incidence in mice compared with single hemizygotes, validating predicted genetic interactions. As genome sequencing is applied to more families and other disorders, GCOD will enable detection of increasingly large, novel gene combinations, shedding light on combinatorial causes of genetic diseases.

数千个家庭的外显子组测序揭示了先天性心脏缺陷(CHD)的许多风险基因,但大多数病例不能用单一的因果突变来解释。即使在同一个家族中,携带已知风险基因特定突变的个体也经常表现出不同的表型,这表明存在遗传修饰因子。为了在不评估数十亿个变异组合的情况下探索冠心病的少原性原因,我们开发了一种高效的、基于模拟的方法来检测先证中携带共同发生的破坏性变异的基因集,其比率高于给定亲代基因型的预期。我们在名为“基因组合寡源性疾病”(GCOD)的软件中实现了这种方法,并将其应用于3377例冠心病三组患者的外显子组测序。该分析发现了160对基因对,其中有害变异以高于预期的频率传播给冠心病先证,但很少或从未出现在未受影响的父母身上。通过特定的心脏表型和考虑高阶基因组合的分层分析产生了额外的6026个基因集。在低基因组中发现的基因在与心脏发育相关的途径中被过度代表,并且经常共同出现在单细胞表达数据的细胞类型标记基因组中。新发现的基因对Gata6-Por的复合杂合性导致小鼠冠心病发病率高于单半合子,验证了预测的遗传相互作用。随着基因组测序应用于更多的家庭和其他疾病,GCOD将能够检测到越来越大的、新的基因组合,从而揭示遗传疾病的组合原因。
{"title":"The oligogenic inheritance test GCOD detects risk genes and their interactions in congenital heart defects.","authors":"Maureen Pittman, Kihyun Lee, Franco Felix, Yu Huang, Adrienne Lam, Mauro W Costa, Deepak Srivastava, Katherine S Pollard","doi":"10.1101/gr.281141.125","DOIUrl":"10.1101/gr.281141.125","url":null,"abstract":"<p><p>Exome sequencing of thousands of families has revealed many risk genes for congenital heart defects (CHDs), yet most cases cannot be explained by a single causal mutation. Even within the same family, individuals carrying a particular mutation in a known risk gene often demonstrate variable phenotypes, suggesting the presence of genetic modifiers. To explore oligogenic causes of CHD without assessing billions of variant combinations, we develop an efficient, simulation-based method to detect gene sets that carry co-occurring damaging variants in probands at a higher rate than expected given parental genotypes. We implement this approach in software called Gene Combinations in Oligogenic Disease (GCOD) and apply it to a cohort of 3377 CHD trios with exome sequencing. This analysis detects 160 gene pairs in which damaging variants are transmitted with higher-than-expected frequency to CHD probands but rarely or never appear in combination in their unaffected parents. Stratifying by specific phenotypes and considering gene combinations of higher orders yields an additional 6026 gene sets. Genes found in oligogenic sets are overrepresented in pathways related to heart development and often co-occur in sets of cell type marker genes from single-cell expression data. Compound heterozygosity of the newly identified digenic pair <i>Gata6-Por</i> leads to higher CHD incidence in mice compared with single hemizygotes, validating predicted genetic interactions. As genome sequencing is applied to more families and other disorders, GCOD will enable detection of increasingly large, novel gene combinations, shedding light on combinatorial causes of genetic diseases.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"330-347"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863180/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145742090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Landscape of microRNA and target expression variation and covariation in single mouse embryonic stem cells. 单个小鼠胚胎干细胞中microRNA和靶标表达变异及共变异的景观。
IF 5.5 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2026-02-03 DOI: 10.1101/gr.279914.124
Marcel Tarbier, Sebastian D Mackowiak, Vaishnovi Sekar, Franziska Bonath, Etka Yapar, Bastian Fromm, Omid R Faridani, Inna Biryukova, Marc R Friedländer

microRNAs are small RNA molecules that can repress the expression of protein-coding genes post-transcriptionally. Previous studies have shown that microRNAs can also have alternative functions, including influencing target expression variation and covariation, but these observations have been limited to a few microRNAs. Here we systematically study microRNA alternative functions in mouse embryonic stem cells (mESCs) by genetically deleting Drosha, leading to global loss of microRNAs. We apply complementary single-cell RNA-seq methods to study the variation of the targets and the microRNAs themselves, and transcriptional inhibition to measure target half-lives. We find that microRNAs form four distinct coexpression groups across single cells. In particular, the mir-290 and the mir-182 genome clusters are abundantly, variably, and inversely expressed. Some cells have global biases toward specific miRNAs originating from either end of the hairpin precursor, suggesting the presence of unknown regulatory cofactors. We find that microRNAs generally increase variation and covariation of their targets at the RNA level, but we also find microRNAs such as miR-182 that appear to have opposite functions. In particular, microRNAs that are themselves variable in expression, such as miR-291a, are more likely to induce covariations. In summary, we apply genetic perturbation and multiomics to give the first global picture of microRNA dynamics at the single-cell level.

microRNAs是一种小的RNA分子,可以在转录后抑制蛋白质编码基因的表达。先前的研究表明,microRNAs还可以具有其他功能,包括影响靶表达变异和共变异,但这些观察仅限于少数microRNAs。在这里,我们系统地研究了microRNA在小鼠胚胎干细胞(mESCs)中的替代功能,通过基因删除Drosha,导致microRNA的全局丢失。我们应用互补的单细胞RNA-seq方法来研究靶标和microrna本身的变化,并利用转录抑制来测量靶标的半衰期。我们发现microrna在单个细胞中形成四个不同的共表达组。特别是,mir-290和mir-182基因组簇是丰富的、可变的和负表达的。一些细胞对发夹前体两端的特定mirna具有全局偏倚,这表明存在未知的调节辅助因子。我们发现microrna通常在RNA水平上增加其靶标的变异和协变,但我们也发现microrna(如miR-182)似乎具有相反的功能。特别是,本身表达可变的microrna,如miR-291a,更有可能诱导协变。总之,我们应用遗传扰动和多组学在单细胞水平上给出了microRNA动力学的第一个全局图像。
{"title":"Landscape of microRNA and target expression variation and covariation in single mouse embryonic stem cells.","authors":"Marcel Tarbier, Sebastian D Mackowiak, Vaishnovi Sekar, Franziska Bonath, Etka Yapar, Bastian Fromm, Omid R Faridani, Inna Biryukova, Marc R Friedländer","doi":"10.1101/gr.279914.124","DOIUrl":"10.1101/gr.279914.124","url":null,"abstract":"<p><p>microRNAs are small RNA molecules that can repress the expression of protein-coding genes post-transcriptionally. Previous studies have shown that microRNAs can also have alternative functions, including influencing target expression variation and covariation, but these observations have been limited to a few microRNAs. Here we systematically study microRNA alternative functions in mouse embryonic stem cells (mESCs) by genetically deleting <i>Drosha</i>, leading to global loss of microRNAs. We apply complementary single-cell RNA-seq methods to study the variation of the targets and the microRNAs themselves, and transcriptional inhibition to measure target half-lives. We find that microRNAs form four distinct coexpression groups across single cells. In particular, the <i>mir-290</i> and the <i>mir-182</i> genome clusters are abundantly, variably, and inversely expressed. Some cells have global biases toward specific miRNAs originating from either end of the hairpin precursor, suggesting the presence of unknown regulatory cofactors. We find that microRNAs generally increase variation and covariation of their targets at the RNA level, but we also find microRNAs such as miR-182 that appear to have opposite functions. In particular, microRNAs that are themselves variable in expression, such as miR-291a, are more likely to induce covariations. In summary, we apply genetic perturbation and multiomics to give the first global picture of microRNA dynamics at the single-cell level.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"291-302"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863184/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145959287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Strain-level metagenomic profiling using pangenome graphs with PanTax. 使用PanTax的泛基因组图谱进行菌株水平宏基因组分析。
IF 5.5 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2026-02-03 DOI: 10.1101/gr.280858.125
Wenhai Zhang, Yuansheng Liu, Guangyi Li, Jialu Xu, Enlian Chen, Alexander Schönhuth, Xiao Luo

Microbes are omnipresent, thriving in a range of habitats, from oceans to soils, and even within our gastrointestinal tracts. They play a vital role in maintaining ecological equilibrium and promoting the health of their hosts. Consequently, understanding the diversity in terms of strains in microbial communities is crucial, as variations between strains can lead to different phenotypic expressions or diverse biological functions. However, current methods for taxonomic classification from metagenomic sequencing data have several limitations, including their reliance solely on species resolution, support for either short or long reads, or their confinement to a given single species. Most notably, most existing strain-level taxonomic classifiers rely on the sequence representation of multiple linear reference genomes, which fails to capture the sequence correlations among these genomes, potentially introducing ambiguity and biases in metagenomic profiling. Here, we present PanTax, a pangenome graph-based taxonomic profiler that overcomes the shortcomings of sequence-based approaches, because pangenome graphs possess the capability to depict the full range of genetic variability present across multiple evolutionarily or environmentally related genomes. PanTax provides a comprehensive solution to taxonomic classification for strain resolution, compatibility with both short and long reads, and compatibility with single or multiple species. Extensive benchmarking results demonstrate that PanTax drastically outperforms state-of-the-art approaches, primarily evidenced by its significantly higher F1 score at the strain level, while maintaining comparable or better performance in other aspects across various data sets.

微生物无处不在,从海洋到土壤,甚至在我们的胃肠道中,它们在各种栖息地中茁壮成长。它们在维持生态平衡和促进宿主健康方面起着至关重要的作用。因此,了解微生物群落中菌株的多样性是至关重要的,因为菌株之间的差异可能导致不同的表型表达或不同的生物学功能。然而,目前基于宏基因组测序数据进行分类的方法存在一些局限性,包括它们仅依赖于物种分辨率,支持短或长读取,或者仅限于给定的单个物种。最值得注意的是,大多数现有的菌株水平分类器依赖于多个线性参考基因组的序列表示,这无法捕获这些基因组之间的序列相关性,可能会在宏基因组分析中引入歧义和偏差。在这里,我们提出了PanTax,一个基于泛基因组图的分类分析器,它克服了基于序列的方法的缺点,因为泛基因组图具有描述跨多个进化或环境相关基因组存在的全范围遗传变异的能力。PanTax提供了一个全面的解决方案,以分类分类的菌株分辨率,兼容短和长读取,并与单一或多个物种的兼容性。广泛的基准测试结果表明,PanTax的性能大大优于最先进的方法,主要证明了它在应变水平上的F1分数明显更高,同时在各种数据集的其他方面保持相当或更好的性能。
{"title":"Strain-level metagenomic profiling using pangenome graphs with PanTax.","authors":"Wenhai Zhang, Yuansheng Liu, Guangyi Li, Jialu Xu, Enlian Chen, Alexander Schönhuth, Xiao Luo","doi":"10.1101/gr.280858.125","DOIUrl":"10.1101/gr.280858.125","url":null,"abstract":"<p><p>Microbes are omnipresent, thriving in a range of habitats, from oceans to soils, and even within our gastrointestinal tracts. They play a vital role in maintaining ecological equilibrium and promoting the health of their hosts. Consequently, understanding the diversity in terms of strains in microbial communities is crucial, as variations between strains can lead to different phenotypic expressions or diverse biological functions. However, current methods for taxonomic classification from metagenomic sequencing data have several limitations, including their reliance solely on species resolution, support for either short or long reads, or their confinement to a given single species. Most notably, most existing strain-level taxonomic classifiers rely on the sequence representation of multiple linear reference genomes, which fails to capture the sequence correlations among these genomes, potentially introducing ambiguity and biases in metagenomic profiling. Here, we present PanTax, a pangenome graph-based taxonomic profiler that overcomes the shortcomings of sequence-based approaches, because pangenome graphs possess the capability to depict the full range of genetic variability present across multiple evolutionarily or environmentally related genomes. PanTax provides a comprehensive solution to taxonomic classification for strain resolution, compatibility with both short and long reads, and compatibility with single or multiple species. Extensive benchmarking results demonstrate that PanTax drastically outperforms state-of-the-art approaches, primarily evidenced by its significantly higher F1 score at the strain level, while maintaining comparable or better performance in other aspects across various data sets.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"405-420"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863173/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145984796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Genome research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1