首页 > 最新文献

Genome research最新文献

英文 中文
Transcriptomic landscape of transposable elements reveals LTR7-PLAAT4 as a potential oncogene and therapeutic target in pancreatic adenocarcinoma. 转座因子的转录组学景观揭示LTR7-PLAAT4是胰腺癌的潜在致癌基因和治疗靶点。
IF 5.5 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2026-02-03 DOI: 10.1101/gr.280528.125
Meilong Shi, Chuanqi Teng, Shan Zhang, Xiaobo He, Lingyun Xu, Fengxian Han, Rongqi Wen, Ganjun Yu, Jingwen Liu, Yang Feng, Yanfeng Wu, Yan Ren, Gang Jin, Jing Li

Eukaryotic genomes contain numerous transposable elements (TEs), whose dysregulation threatens genome stability and may contribute to cancer. Pancreatic adenocarcinoma (PAAD) is among the deadliest cancers, marked by abundant stroma that obscures tumor-specific molecular signals, complicating bulk-tissue analyses. Here, using 71 patient-derived PAAD organoids, we show that TE activities may potentially promote tumorigenesis and provide a source of novel immunotherapeutic targets. We identify 16 new TE-derived transcripts fused with 15 known oncogenes, exhibiting potential oncogenic function and prognostic value. Notably, LTR7-PLAAT4, present in 29% of tumors, encodes a protein variant transcriptionally regulated by FOXM1 binding to the LTR7 promoter. LTR7-PLAAT4 isoform 2 is associated with increased cholesterol ester accumulation and lipid droplet formation mediated through BSCL2 coexpression, potentially fostering tumor progression. On the immunogenic front, HLA-I immunopeptidomics of AsPC-1 cells and DAC13 organoids identify over 11,000 peptides respectively. Althought mutation-derived neoantigens are rare, several peptides are originated from TE-chimeric transcripts, including four predicted by TEprof2. The peptide FLIQHLPLV, detected in 27% of organoids, exhibits robust immunogenicity, validated by T2 binding, mass spectrometry and ELISPOT assays with HLA-genotyped PBMCs. Together, these findings suggest that TE activities may contribute to PAAD progression and diversify its immunopeptidome, providing new opportunities for molecular subtyping and potential immunotherapeutic intervention.

真核生物基因组含有大量的转座因子(te),其失调威胁着基因组的稳定性并可能导致癌症。胰腺腺癌(PAAD)是最致命的癌症之一,其特征是丰富的间质掩盖了肿瘤特异性分子信号,使大组织分析复杂化。在这里,使用71例患者来源的PAAD类器官,我们表明TE活性可能潜在地促进肿瘤发生,并提供新的免疫治疗靶点来源。我们鉴定了16个新的te衍生转录物与15个已知的癌基因融合,显示出潜在的致癌功能和预后价值。值得注意的是,LTR7- plaat4存在于29%的肿瘤中,它编码一种由FOXM1结合LTR7启动子转录调节的蛋白质变体。LTR7-PLAAT4亚型2与通过BSCL2共表达介导的胆固醇酯积累和脂滴形成增加有关,可能促进肿瘤进展。在免疫原性方面,AsPC-1细胞和DAC13类器官的hla -1免疫肽组学分别鉴定了超过11,000个肽。虽然突变衍生的新抗原很少见,但有几种肽起源于te嵌合转录物,包括TEprof2预测的四种。在27%的类器官中检测到FLIQHLPLV肽,通过T2结合、质谱和ELISPOT对hla基因型PBMCs的检测证实,FLIQHLPLV肽具有强大的免疫原性。总之,这些发现表明TE活动可能有助于PAAD的进展,并使其免疫肽多肽多样化,为分子分型和潜在的免疫治疗干预提供了新的机会。
{"title":"Transcriptomic landscape of transposable elements reveals <i>LTR7</i>-<i>PLAAT4</i> as a potential oncogene and therapeutic target in pancreatic adenocarcinoma.","authors":"Meilong Shi, Chuanqi Teng, Shan Zhang, Xiaobo He, Lingyun Xu, Fengxian Han, Rongqi Wen, Ganjun Yu, Jingwen Liu, Yang Feng, Yanfeng Wu, Yan Ren, Gang Jin, Jing Li","doi":"10.1101/gr.280528.125","DOIUrl":"10.1101/gr.280528.125","url":null,"abstract":"<p><p>Eukaryotic genomes contain numerous transposable elements (TEs), whose dysregulation threatens genome stability and may contribute to cancer. Pancreatic adenocarcinoma (PAAD) is among the deadliest cancers, marked by abundant stroma that obscures tumor-specific molecular signals, complicating bulk-tissue analyses. Here, using 71 patient-derived PAAD organoids, we show that TE activities may potentially promote tumorigenesis and provide a source of novel immunotherapeutic targets. We identify 16 new TE-derived transcripts fused with 15 known oncogenes, exhibiting potential oncogenic function and prognostic value. Notably, <i>LTR7</i>-<i>PLAAT4</i>, present in 29% of tumors, encodes a protein variant transcriptionally regulated by <i>FOXM1</i> binding to the <i>LTR7</i> promoter. <i>LTR7</i>-<i>PLAAT4</i> isoform 2 is associated with increased cholesterol ester accumulation and lipid droplet formation mediated through <i>BSCL2</i> coexpression, potentially fostering tumor progression. On the immunogenic front, HLA-I immunopeptidomics of AsPC-1 cells and DAC13 organoids identify over 11,000 peptides respectively. Althought mutation-derived neoantigens are rare, several peptides are originated from TE-chimeric transcripts, including four predicted by TEprof2. The peptide FLIQHLPLV, detected in 27% of organoids, exhibits robust immunogenicity, validated by T2 binding, mass spectrometry and ELISPOT assays with HLA-genotyped PBMCs. Together, these findings suggest that TE activities may contribute to PAAD progression and diversify its immunopeptidome, providing new opportunities for molecular subtyping and potential immunotherapeutic intervention.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"275-290"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863058/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145932905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transcription and potential functions of a novel XIST isoform in male peripheral glia. 一种新的XIST亚型在男性外周胶质细胞中的转录和潜在功能。
IF 5.5 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2026-02-03 DOI: 10.1101/gr.280832.125
Kevin S O'Leary, Meng-Yen Li, Kevyn Jackson, Lijie Shi, Elena Ezhkova, Bernice E Morrow, Deyou Zheng

The XIST RNA is known for its critical roles in X Chromosome inactivation (XCI). It is thought to be expressed exclusively from one copy of the X Chromosome and silence it by recruiting various chromatin factors in female cells. In this study, we find XIST expression in male peripheral glia after integrated analyses of single-cell RNA-seq data from multiple human tissues and organs. Single-cell epigenomic data further indicate that the expression is likely driven by an alternative promoter at the end of the first exon, resulting in at least one shorter transcript (referred to as sXIST) that is active in Schwann cells and, moreover, at a higher level in nonmyelinating Schwann cells. This promoter exhibits similar activity in female glia. Multiple lines of evidence from bulk transcriptomic and epigenomic data from peripheral nerve tissues further support these findings. Genes coexpressed positively and strongly with sXIST in male glia show functional enrichment in axon assembly and cilia signaling, with many of them sharing putative miRNA binding sites with sXIST, whereas the negatively correlated genes are enriched for processes important for neuromuscular junctions. This suggests possible functions of sXIST in modulating glia-neuron interactions, perhaps via competitive miRNA binding. This idea is also supported by overexpression analysis of a partial sXIST sequence and the finding of significant XIST expression changes in human cardiomyopathy and polyneuropathy. In summary, the current study suggests a novel, non-XCI role of XIST in peripheral Schwann cells that is mediated by a newly recognized transcript.

XIST RNA因其在X染色体失活(XCI)中的关键作用而闻名。它被认为只从X染色体的一个拷贝中表达,并通过在女性细胞中招募各种染色质因子来沉默它。在这项研究中,我们通过对来自多个人体组织和器官的单细胞RNA-seq数据进行综合分析,发现XIST在男性外周胶质细胞中表达。单细胞表观基因组数据进一步表明,表达可能是由第一个外显子末端的替代启动子驱动的,导致至少一个较短的转录本(称为sXIST)在雪旺细胞中活跃,而且在非髓鞘雪旺细胞中活性更高。该启动子在雌性胶质细胞中表现出类似的活性。来自周围神经组织的大量转录组学和表观基因组学数据的多条证据进一步支持了这些发现。在雄性胶质细胞中,与sXIST共表达阳性和强表达的基因在轴突组装和纤毛信号传导中表现出功能富集,其中许多基因与sXIST共享假定的miRNA结合位点,而负相关基因则在神经肌肉连接的重要过程中富集。这表明sXIST可能通过竞争性miRNA结合来调节胶质细胞与神经元的相互作用。这一观点也得到了部分sXIST序列过表达分析的支持,并发现在人类心肌病和多发性神经病中存在显著的XIST表达变化。总之,目前的研究表明,XIST在外周雪旺细胞中具有一种新的非xci作用,这种作用是由一种新识别的转录物介导的。
{"title":"Transcription and potential functions of a novel <i>XIST</i> isoform in male peripheral glia.","authors":"Kevin S O'Leary, Meng-Yen Li, Kevyn Jackson, Lijie Shi, Elena Ezhkova, Bernice E Morrow, Deyou Zheng","doi":"10.1101/gr.280832.125","DOIUrl":"10.1101/gr.280832.125","url":null,"abstract":"<p><p>The <i>XIST</i> RNA is known for its critical roles in X Chromosome inactivation (XCI). It is thought to be expressed exclusively from one copy of the X Chromosome and silence it by recruiting various chromatin factors in female cells. In this study, we find <i>XIST</i> expression in male peripheral glia after integrated analyses of single-cell RNA-seq data from multiple human tissues and organs. Single-cell epigenomic data further indicate that the expression is likely driven by an alternative promoter at the end of the first exon, resulting in at least one shorter transcript (referred to as <i>sXIST</i>) that is active in Schwann cells and, moreover, at a higher level in nonmyelinating Schwann cells. This promoter exhibits similar activity in female glia. Multiple lines of evidence from bulk transcriptomic and epigenomic data from peripheral nerve tissues further support these findings. Genes coexpressed positively and strongly with <i>sXIST</i> in male glia show functional enrichment in axon assembly and cilia signaling, with many of them sharing putative miRNA binding sites with <i>sXIST</i>, whereas the negatively correlated genes are enriched for processes important for neuromuscular junctions. This suggests possible functions of <i>sXIST</i> in modulating glia-neuron interactions, perhaps via competitive miRNA binding. This idea is also supported by overexpression analysis of a partial <i>sXIST</i> sequence and the finding of significant <i>XIST</i> expression changes in human cardiomyopathy and polyneuropathy. In summary, the current study suggests a novel, non-XCI role of <i>XIST</i> in peripheral Schwann cells that is mediated by a newly recognized transcript.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"257-274"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863056/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145742207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The SynMall resource for characterizing the functional impact of synonymous variation. 表征同义变化的功能影响的SynMall资源。
IF 5.5 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2026-02-03 DOI: 10.1101/gr.281257.125
Chen Ye, Xiaoyan Li, Na Cheng, Yansen Su, Junfeng Xia

Synonymous single-nucleotide variants (sSNVs) are increasingly recognized as contributors to disease, yet existing variant annotation databases offer limited functional insights for sSNVs. Here, we present SynMall, a comprehensive resource designed to decipher the functional impact of synonymous variation. SynMall catalogs 25 million potential human sSNVs and integrates evolutionary and population information of sSNVs from 45 non-human species. For each human sSNV, SynMall provides multilevel annotations that combine American College of Medical Genetics and Genomics (ACMG)-aligned variant interpretation information, such as allele frequencies and functional effects, with more than 100 descriptors at the DNA, RNA, and protein levels. These include both handcrafted features and embeddings from large language models to support advanced representation learning. To prioritize pathogenic sSNVs, we have developed SynScore, a machine learning framework that integrates ACMG guidelines and diverse biological characteristics. Benchmark comparisons show that SynScore achieves state-of-the-art performance, validating its effectiveness for genome-wide pathogenicity inference. Furthermore, SynMall enables mechanistic exploration by investigating in silico assessments and curated literature evidence to evaluate sSNV effects on miRNA-mRNA interactions, mRNA splicing, mRNA stability, and codon usage. By consolidating these features into a unified platform, we anticipate that SynMall will serve as a valuable resource for elucidating the functional role of synonymous mutations.

同义单核苷酸变异(ssnv)越来越被认为是疾病的贡献者,然而现有的变异注释数据库对ssnv的功能见解有限。我们提出SynMall,一个综合性的资源,旨在破译同义变化的功能影响。SynMall收录了2500万个潜在的人类ssnv,并整合了45个非人类物种的ssnv进化和种群信息。对于每个人类sSNV, SynMall提供了多层注释,这些注释结合了美国医学遗传学和基因组学学院(ACMG)的变异解释信息,如等位基因频率和功能效应,以及100多个DNA、RNA和蛋白质水平的描述符。这包括手工制作的特征和来自大型语言模型的嵌入,以支持高级表示学习。为了优先考虑致病性ssnv,我们开发了SynScore,这是一个整合ACMG指南和多种生物学特性的机器学习框架。基准比较表明,SynScore达到了最先进的性能,验证了其在全基因组致病性推断方面的有效性。此外,SynMall通过研究计算机评估和整理的文献证据来评估sSNV对miRNA-mRNA相互作用、mRNA剪接、mRNA稳定性和密码子使用的影响,从而进行机制探索。通过将这些功能整合到一个统一的平台中,我们预计SynMall将成为阐明同义突变功能角色的宝贵资源。
{"title":"The SynMall resource for characterizing the functional impact of synonymous variation.","authors":"Chen Ye, Xiaoyan Li, Na Cheng, Yansen Su, Junfeng Xia","doi":"10.1101/gr.281257.125","DOIUrl":"10.1101/gr.281257.125","url":null,"abstract":"<p><p>Synonymous single-nucleotide variants (sSNVs) are increasingly recognized as contributors to disease, yet existing variant annotation databases offer limited functional insights for sSNVs. Here, we present SynMall, a comprehensive resource designed to decipher the functional impact of synonymous variation. SynMall catalogs 25 million potential human sSNVs and integrates evolutionary and population information of sSNVs from 45 non-human species. For each human sSNV, SynMall provides multilevel annotations that combine American College of Medical Genetics and Genomics (ACMG)-aligned variant interpretation information, such as allele frequencies and functional effects, with more than 100 descriptors at the DNA, RNA, and protein levels. These include both handcrafted features and embeddings from large language models to support advanced representation learning. To prioritize pathogenic sSNVs, we have developed SynScore, a machine learning framework that integrates ACMG guidelines and diverse biological characteristics. Benchmark comparisons show that SynScore achieves state-of-the-art performance, validating its effectiveness for genome-wide pathogenicity inference. Furthermore, SynMall enables mechanistic exploration by investigating in silico assessments and curated literature evidence to evaluate sSNV effects on miRNA-mRNA interactions, mRNA splicing, mRNA stability, and codon usage. By consolidating these features into a unified platform, we anticipate that SynMall will serve as a valuable resource for elucidating the functional role of synonymous mutations.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"421-431"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863188/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145774461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantifying pathological progression from single-cell transcriptomic data with scPSS. 用scPSS从单细胞转录组学数据量化病理进展。
IF 5.5 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2026-02-03 DOI: 10.1101/gr.280411.125
Samin Rahman Khan, M Saifur Rahman, M Sohel Rahman, Md Abul Hassan Samee

The surge in single-cell data sets and reference atlases has enabled the comparison of cell states across conditions, yet a gap persists in quantifying pathological shifts from healthy cell states. To address this gap, we introduce single-cell Pathological Shift Scoring (scPSS), which provides a statistical measure for how much a "query" cell from a diseased sample has shifted away from a reference group of healthy cells. In scPSS, the distance of a cell to its k-th nearest reference cell is considered as its pathological shift score. Euclidean distances in the top n principal component space of the gene expressions are used to measure distances between cells. The distribution of shift scores of the reference cells forms a null model. This allows a P-value to be assigned to each query cell's shift score, quantifying its statistical significance of being in the reference cell group. This makes our method both simple and statistically rigorous. The key strength of scPSS is its applicability in a "semisupervised" setting, where only healthy reference cells are known and diseased-labeled data are not provided for model training. As existing methods do not support cell-level pathological progression measurement in this setting, we adapt state-of-the-art supervised pathological prediction and contrastive models for benchmarking. Comparative evaluations against these adapted models demonstrate our method's superiority in accuracy and efficiency. Additionally, we show that the aggregation of cell-level pathological scores from scPSS can be used to predict health conditions at the individual level.

单细胞数据集和参考图谱的激增使不同条件下细胞状态的比较成为可能,然而,在定量健康细胞状态的病理转变方面仍然存在差距。为了解决这一差距,我们引入了单细胞病理移位评分(scPSS),它提供了一种统计方法,用于测量来自患病样本的“查询”细胞从参考组健康细胞中移位的程度。在scPSS中,细胞到第k个最近的参考细胞的距离被认为是其病理移位评分。在基因表达的前n主成分空间中的欧几里得距离被用来测量细胞之间的距离。参考单元格的移位分数分布形成一个零模型。这允许将p值分配给每个查询单元的移位分数,量化其在参考单元组中的统计显著性。这使得我们的方法既简单又统计严谨。scPSS的关键优势在于它在“半监督”环境中的适用性,在这种环境中,只有健康的参考细胞是已知的,而疾病标记的数据不提供给模型训练。由于现有方法不支持在这种情况下细胞水平的病理进展测量,我们采用最先进的监督病理预测和对比模型进行基准测试。通过与这些模型的比较,证明了该方法在精度和效率上的优越性。此外,我们表明scPSS细胞水平病理评分的聚合可以用于预测个体水平的健康状况。
{"title":"Quantifying pathological progression from single-cell transcriptomic data with scPSS.","authors":"Samin Rahman Khan, M Saifur Rahman, M Sohel Rahman, Md Abul Hassan Samee","doi":"10.1101/gr.280411.125","DOIUrl":"10.1101/gr.280411.125","url":null,"abstract":"<p><p>The surge in single-cell data sets and reference atlases has enabled the comparison of cell states across conditions, yet a gap persists in quantifying pathological shifts from healthy cell states. To address this gap, we introduce <u>s</u>ingle-<u>c</u>ell <u>P</u>athological <u>S</u>hift <u>S</u>coring (scPSS), which provides a statistical measure for how much a \"query\" cell from a diseased sample has shifted away from a reference group of healthy cells. In scPSS, the distance of a cell to its <i>k</i>-th nearest reference cell is considered as its pathological shift score. Euclidean distances in the top <i>n</i> principal component space of the gene expressions are used to measure distances between cells. The distribution of shift scores of the reference cells forms a null model. This allows a <i>P</i>-value to be assigned to each query cell's shift score, quantifying its statistical significance of being in the reference cell group. This makes our method both simple and statistically rigorous. The key strength of scPSS is its applicability in a \"semisupervised\" setting, where only healthy reference cells are known and diseased-labeled data are not provided for model training. As existing methods do not support cell-level pathological progression measurement in this setting, we adapt state-of-the-art supervised pathological prediction and contrastive models for benchmarking. Comparative evaluations against these adapted models demonstrate our method's superiority in accuracy and efficiency. Additionally, we show that the aggregation of cell-level pathological scores from scPSS can be used to predict health conditions at the individual level.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"375-386"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863187/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145984736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The oligogenic inheritance test GCOD detects risk genes and their interactions in congenital heart defects. 低基因遗传测试GCOD检测先天性心脏缺陷的风险基因及其相互作用。
IF 5.5 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2026-02-03 DOI: 10.1101/gr.281141.125
Maureen Pittman, Kihyun Lee, Franco Felix, Yu Huang, Adrienne Lam, Mauro W Costa, Deepak Srivastava, Katherine S Pollard

Exome sequencing of thousands of families has revealed many risk genes for congenital heart defects (CHDs), yet most cases cannot be explained by a single causal mutation. Even within the same family, individuals carrying a particular mutation in a known risk gene often demonstrate variable phenotypes, suggesting the presence of genetic modifiers. To explore oligogenic causes of CHD without assessing billions of variant combinations, we develop an efficient, simulation-based method to detect gene sets that carry co-occurring damaging variants in probands at a higher rate than expected given parental genotypes. We implement this approach in software called Gene Combinations in Oligogenic Disease (GCOD) and apply it to a cohort of 3377 CHD trios with exome sequencing. This analysis detects 160 gene pairs in which damaging variants are transmitted with higher-than-expected frequency to CHD probands but rarely or never appear in combination in their unaffected parents. Stratifying by specific phenotypes and considering gene combinations of higher orders yields an additional 6026 gene sets. Genes found in oligogenic sets are overrepresented in pathways related to heart development and often co-occur in sets of cell type marker genes from single-cell expression data. Compound heterozygosity of the newly identified digenic pair Gata6-Por leads to higher CHD incidence in mice compared with single hemizygotes, validating predicted genetic interactions. As genome sequencing is applied to more families and other disorders, GCOD will enable detection of increasingly large, novel gene combinations, shedding light on combinatorial causes of genetic diseases.

数千个家庭的外显子组测序揭示了先天性心脏缺陷(CHD)的许多风险基因,但大多数病例不能用单一的因果突变来解释。即使在同一个家族中,携带已知风险基因特定突变的个体也经常表现出不同的表型,这表明存在遗传修饰因子。为了在不评估数十亿个变异组合的情况下探索冠心病的少原性原因,我们开发了一种高效的、基于模拟的方法来检测先证中携带共同发生的破坏性变异的基因集,其比率高于给定亲代基因型的预期。我们在名为“基因组合寡源性疾病”(GCOD)的软件中实现了这种方法,并将其应用于3377例冠心病三组患者的外显子组测序。该分析发现了160对基因对,其中有害变异以高于预期的频率传播给冠心病先证,但很少或从未出现在未受影响的父母身上。通过特定的心脏表型和考虑高阶基因组合的分层分析产生了额外的6026个基因集。在低基因组中发现的基因在与心脏发育相关的途径中被过度代表,并且经常共同出现在单细胞表达数据的细胞类型标记基因组中。新发现的基因对Gata6-Por的复合杂合性导致小鼠冠心病发病率高于单半合子,验证了预测的遗传相互作用。随着基因组测序应用于更多的家庭和其他疾病,GCOD将能够检测到越来越大的、新的基因组合,从而揭示遗传疾病的组合原因。
{"title":"The oligogenic inheritance test GCOD detects risk genes and their interactions in congenital heart defects.","authors":"Maureen Pittman, Kihyun Lee, Franco Felix, Yu Huang, Adrienne Lam, Mauro W Costa, Deepak Srivastava, Katherine S Pollard","doi":"10.1101/gr.281141.125","DOIUrl":"10.1101/gr.281141.125","url":null,"abstract":"<p><p>Exome sequencing of thousands of families has revealed many risk genes for congenital heart defects (CHDs), yet most cases cannot be explained by a single causal mutation. Even within the same family, individuals carrying a particular mutation in a known risk gene often demonstrate variable phenotypes, suggesting the presence of genetic modifiers. To explore oligogenic causes of CHD without assessing billions of variant combinations, we develop an efficient, simulation-based method to detect gene sets that carry co-occurring damaging variants in probands at a higher rate than expected given parental genotypes. We implement this approach in software called Gene Combinations in Oligogenic Disease (GCOD) and apply it to a cohort of 3377 CHD trios with exome sequencing. This analysis detects 160 gene pairs in which damaging variants are transmitted with higher-than-expected frequency to CHD probands but rarely or never appear in combination in their unaffected parents. Stratifying by specific phenotypes and considering gene combinations of higher orders yields an additional 6026 gene sets. Genes found in oligogenic sets are overrepresented in pathways related to heart development and often co-occur in sets of cell type marker genes from single-cell expression data. Compound heterozygosity of the newly identified digenic pair <i>Gata6-Por</i> leads to higher CHD incidence in mice compared with single hemizygotes, validating predicted genetic interactions. As genome sequencing is applied to more families and other disorders, GCOD will enable detection of increasingly large, novel gene combinations, shedding light on combinatorial causes of genetic diseases.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"330-347"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863180/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145742090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
spRefine denoises and imputes spatial transcriptomics with a reference-free framework powered by genomic language model. 利用基因组语言模型提供的无参考框架对空间转录组进行去噪和归算。
IF 5.5 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2026-02-03 DOI: 10.1101/gr.281001.125
Tianyu Liu, Tinglin Huang, Wengong Jin, Tinyi Chu, Rex Ying, Hongyu Zhao

The analysis of spatial transcriptomics is hindered by high noise levels and missing gene measurements, challenges that are further compounded by the higher cost of spatial data compared to traditional single-cell data. To overcome this challenge, we introduce spRefine, a deep learning framework that leverages genomic language models to jointly denoise and impute spatial transcriptomic data. Our results demonstrate that spRefine yields more robust cell- and spot-level representations after denoising and imputation, substantially improving data integration. In addition, spRefine serves as a strong framework for model pretraining and the discovery of novel biological signals, as highlighted by multiple downstream applications across datasets of varying scales. Notably, spRefine enhances the accuracy of spatial ageing clock estimations and uncovers new aging-related relationships associated with key biological processes, such as neuronal function loss, which offers new insights for analyzing ageing effect with spatial transcriptomics.

空间转录组学的分析受到高噪声水平和缺失基因测量的阻碍,与传统的单细胞数据相比,空间数据的更高成本进一步加剧了这一挑战。为了克服这一挑战,我们引入了spRefine,这是一个深度学习框架,它利用基因组语言模型来联合去噪和计算空间转录组数据。我们的研究结果表明,spRefine在去噪和imputation后产生了更稳健的单元级和点级表示,大大提高了数据集成。此外,spRefine还可以作为模型预训练和发现新生物信号的强大框架,这一点在不同规模的数据集上的多个下游应用中得到了突出体现。值得注意的是,spRefine提高了空间衰老时钟估计的准确性,并揭示了与关键生物过程(如神经元功能丧失)相关的新的衰老相关关系,这为利用空间转录组学分析衰老效应提供了新的见解。
{"title":"spRefine denoises and imputes spatial transcriptomics with a reference-free framework powered by genomic language model.","authors":"Tianyu Liu, Tinglin Huang, Wengong Jin, Tinyi Chu, Rex Ying, Hongyu Zhao","doi":"10.1101/gr.281001.125","DOIUrl":"10.1101/gr.281001.125","url":null,"abstract":"<p><p>The analysis of spatial transcriptomics is hindered by high noise levels and missing gene measurements, challenges that are further compounded by the higher cost of spatial data compared to traditional single-cell data. To overcome this challenge, we introduce spRefine, a deep learning framework that leverages genomic language models to jointly denoise and impute spatial transcriptomic data. Our results demonstrate that spRefine yields more robust cell- and spot-level representations after denoising and imputation, substantially improving data integration. In addition, spRefine serves as a strong framework for model pretraining and the discovery of novel biological signals, as highlighted by multiple downstream applications across datasets of varying scales. Notably, spRefine enhances the accuracy of spatial ageing clock estimations and uncovers new aging-related relationships associated with key biological processes, such as neuronal function loss, which offers new insights for analyzing ageing effect with spatial transcriptomics.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":""},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146112803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Landscape of microRNA and target expression variation and covariation in single mouse embryonic stem cells. 单个小鼠胚胎干细胞中microRNA和靶标表达变异及共变异的景观。
IF 5.5 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2026-02-03 DOI: 10.1101/gr.279914.124
Marcel Tarbier, Sebastian D Mackowiak, Vaishnovi Sekar, Franziska Bonath, Etka Yapar, Bastian Fromm, Omid R Faridani, Inna Biryukova, Marc R Friedländer

microRNAs are small RNA molecules that can repress the expression of protein-coding genes post-transcriptionally. Previous studies have shown that microRNAs can also have alternative functions, including influencing target expression variation and covariation, but these observations have been limited to a few microRNAs. Here we systematically study microRNA alternative functions in mouse embryonic stem cells (mESCs) by genetically deleting Drosha, leading to global loss of microRNAs. We apply complementary single-cell RNA-seq methods to study the variation of the targets and the microRNAs themselves, and transcriptional inhibition to measure target half-lives. We find that microRNAs form four distinct coexpression groups across single cells. In particular, the mir-290 and the mir-182 genome clusters are abundantly, variably, and inversely expressed. Some cells have global biases toward specific miRNAs originating from either end of the hairpin precursor, suggesting the presence of unknown regulatory cofactors. We find that microRNAs generally increase variation and covariation of their targets at the RNA level, but we also find microRNAs such as miR-182 that appear to have opposite functions. In particular, microRNAs that are themselves variable in expression, such as miR-291a, are more likely to induce covariations. In summary, we apply genetic perturbation and multiomics to give the first global picture of microRNA dynamics at the single-cell level.

microRNAs是一种小的RNA分子,可以在转录后抑制蛋白质编码基因的表达。先前的研究表明,microRNAs还可以具有其他功能,包括影响靶表达变异和共变异,但这些观察仅限于少数microRNAs。在这里,我们系统地研究了microRNA在小鼠胚胎干细胞(mESCs)中的替代功能,通过基因删除Drosha,导致microRNA的全局丢失。我们应用互补的单细胞RNA-seq方法来研究靶标和microrna本身的变化,并利用转录抑制来测量靶标的半衰期。我们发现microrna在单个细胞中形成四个不同的共表达组。特别是,mir-290和mir-182基因组簇是丰富的、可变的和负表达的。一些细胞对发夹前体两端的特定mirna具有全局偏倚,这表明存在未知的调节辅助因子。我们发现microrna通常在RNA水平上增加其靶标的变异和协变,但我们也发现microrna(如miR-182)似乎具有相反的功能。特别是,本身表达可变的microrna,如miR-291a,更有可能诱导协变。总之,我们应用遗传扰动和多组学在单细胞水平上给出了microRNA动力学的第一个全局图像。
{"title":"Landscape of microRNA and target expression variation and covariation in single mouse embryonic stem cells.","authors":"Marcel Tarbier, Sebastian D Mackowiak, Vaishnovi Sekar, Franziska Bonath, Etka Yapar, Bastian Fromm, Omid R Faridani, Inna Biryukova, Marc R Friedländer","doi":"10.1101/gr.279914.124","DOIUrl":"10.1101/gr.279914.124","url":null,"abstract":"<p><p>microRNAs are small RNA molecules that can repress the expression of protein-coding genes post-transcriptionally. Previous studies have shown that microRNAs can also have alternative functions, including influencing target expression variation and covariation, but these observations have been limited to a few microRNAs. Here we systematically study microRNA alternative functions in mouse embryonic stem cells (mESCs) by genetically deleting <i>Drosha</i>, leading to global loss of microRNAs. We apply complementary single-cell RNA-seq methods to study the variation of the targets and the microRNAs themselves, and transcriptional inhibition to measure target half-lives. We find that microRNAs form four distinct coexpression groups across single cells. In particular, the <i>mir-290</i> and the <i>mir-182</i> genome clusters are abundantly, variably, and inversely expressed. Some cells have global biases toward specific miRNAs originating from either end of the hairpin precursor, suggesting the presence of unknown regulatory cofactors. We find that microRNAs generally increase variation and covariation of their targets at the RNA level, but we also find microRNAs such as miR-182 that appear to have opposite functions. In particular, microRNAs that are themselves variable in expression, such as miR-291a, are more likely to induce covariations. In summary, we apply genetic perturbation and multiomics to give the first global picture of microRNA dynamics at the single-cell level.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"291-302"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863184/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145959287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Strain-level metagenomic profiling using pangenome graphs with PanTax. 使用PanTax的泛基因组图谱进行菌株水平宏基因组分析。
IF 5.5 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2026-02-03 DOI: 10.1101/gr.280858.125
Wenhai Zhang, Yuansheng Liu, Guangyi Li, Jialu Xu, Enlian Chen, Alexander Schönhuth, Xiao Luo

Microbes are omnipresent, thriving in a range of habitats, from oceans to soils, and even within our gastrointestinal tracts. They play a vital role in maintaining ecological equilibrium and promoting the health of their hosts. Consequently, understanding the diversity in terms of strains in microbial communities is crucial, as variations between strains can lead to different phenotypic expressions or diverse biological functions. However, current methods for taxonomic classification from metagenomic sequencing data have several limitations, including their reliance solely on species resolution, support for either short or long reads, or their confinement to a given single species. Most notably, most existing strain-level taxonomic classifiers rely on the sequence representation of multiple linear reference genomes, which fails to capture the sequence correlations among these genomes, potentially introducing ambiguity and biases in metagenomic profiling. Here, we present PanTax, a pangenome graph-based taxonomic profiler that overcomes the shortcomings of sequence-based approaches, because pangenome graphs possess the capability to depict the full range of genetic variability present across multiple evolutionarily or environmentally related genomes. PanTax provides a comprehensive solution to taxonomic classification for strain resolution, compatibility with both short and long reads, and compatibility with single or multiple species. Extensive benchmarking results demonstrate that PanTax drastically outperforms state-of-the-art approaches, primarily evidenced by its significantly higher F1 score at the strain level, while maintaining comparable or better performance in other aspects across various data sets.

微生物无处不在,从海洋到土壤,甚至在我们的胃肠道中,它们在各种栖息地中茁壮成长。它们在维持生态平衡和促进宿主健康方面起着至关重要的作用。因此,了解微生物群落中菌株的多样性是至关重要的,因为菌株之间的差异可能导致不同的表型表达或不同的生物学功能。然而,目前基于宏基因组测序数据进行分类的方法存在一些局限性,包括它们仅依赖于物种分辨率,支持短或长读取,或者仅限于给定的单个物种。最值得注意的是,大多数现有的菌株水平分类器依赖于多个线性参考基因组的序列表示,这无法捕获这些基因组之间的序列相关性,可能会在宏基因组分析中引入歧义和偏差。在这里,我们提出了PanTax,一个基于泛基因组图的分类分析器,它克服了基于序列的方法的缺点,因为泛基因组图具有描述跨多个进化或环境相关基因组存在的全范围遗传变异的能力。PanTax提供了一个全面的解决方案,以分类分类的菌株分辨率,兼容短和长读取,并与单一或多个物种的兼容性。广泛的基准测试结果表明,PanTax的性能大大优于最先进的方法,主要证明了它在应变水平上的F1分数明显更高,同时在各种数据集的其他方面保持相当或更好的性能。
{"title":"Strain-level metagenomic profiling using pangenome graphs with PanTax.","authors":"Wenhai Zhang, Yuansheng Liu, Guangyi Li, Jialu Xu, Enlian Chen, Alexander Schönhuth, Xiao Luo","doi":"10.1101/gr.280858.125","DOIUrl":"10.1101/gr.280858.125","url":null,"abstract":"<p><p>Microbes are omnipresent, thriving in a range of habitats, from oceans to soils, and even within our gastrointestinal tracts. They play a vital role in maintaining ecological equilibrium and promoting the health of their hosts. Consequently, understanding the diversity in terms of strains in microbial communities is crucial, as variations between strains can lead to different phenotypic expressions or diverse biological functions. However, current methods for taxonomic classification from metagenomic sequencing data have several limitations, including their reliance solely on species resolution, support for either short or long reads, or their confinement to a given single species. Most notably, most existing strain-level taxonomic classifiers rely on the sequence representation of multiple linear reference genomes, which fails to capture the sequence correlations among these genomes, potentially introducing ambiguity and biases in metagenomic profiling. Here, we present PanTax, a pangenome graph-based taxonomic profiler that overcomes the shortcomings of sequence-based approaches, because pangenome graphs possess the capability to depict the full range of genetic variability present across multiple evolutionarily or environmentally related genomes. PanTax provides a comprehensive solution to taxonomic classification for strain resolution, compatibility with both short and long reads, and compatibility with single or multiple species. Extensive benchmarking results demonstrate that PanTax drastically outperforms state-of-the-art approaches, primarily evidenced by its significantly higher F1 score at the strain level, while maintaining comparable or better performance in other aspects across various data sets.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"405-420"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863173/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145984796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Autoencoders for genomic variation analysis. 基因组变异分析的自动编码器。
IF 5.5 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2026-02-03 DOI: 10.1101/gr.280086.124
Margarita Geleta, Daniel Mas Montserrat, Xavier Giro-I-Nieto, Alexander G Ioannidis

Modern biobanks are providing numerous high-resolution genomic sequences of diverse populations. In order to account for diverse and admixed populations, new algorithmic tools are needed in order to properly capture the genetic composition of populations. Here, we explore deep learning techniques, namely, variational autoencoders (VAEs), to process genomic data from a population perspective. We show the power of VAEs for a variety of tasks relating to the interpretation, compression, classification, and simulation of genomic data with several worldwide whole genome data sets from both humans and canids, and evaluate the performance of the proposed applications with and without ancestry conditioning. The unsupervised setting of autoencoders allows for the detection and learning of granular population structure and inferring of informative latent factors. The learned latent spaces of VAEs are able to capture and represent differentiated Gaussian-like clusters of samples with similar genetic composition on a fine scale from single nucleotide polymorphisms (SNPs), enabling applications in dimensionality reduction and data simulation. These individual genotype sequences can then be decomposed into latent representations and reconstruction errors (residuals), which provide a sparse representation useful for lossless compression. We show that different populations have differentiated compression ratios and classification accuracies. Additionally, we analyze the entropy of the SNP data, its effect on compression across populations, and its relation to historical migrations, and we show how to introduce autoencoders into existing compression pipelines.

现代生物银行提供了大量不同人群的高分辨率基因组序列。为了考虑多样化和混合的种群,需要新的算法工具来适当地捕捉种群的遗传组成。在这里,我们探索深度学习技术,即变分自编码器(VAEs),从人口的角度处理基因组数据。我们用来自人类和犬科动物的几个全球全基因组数据集展示了VAEs在与基因组数据的解释、压缩、分类和模拟相关的各种任务中的能力,并评估了在有和没有祖先条件作用的情况下提出的应用程序的性能。自动编码器的无监督设置允许检测和学习颗粒种群结构和推断信息潜在因素。VAEs的学习潜空间能够从单核苷酸多态性(snp)中捕获和表示具有相似遗传组成的样本的差异化高斯类簇,从而实现降维和数据模拟的应用。然后,这些个体基因型序列可以分解为潜在表示和重建误差(残差),这为无损压缩提供了有用的稀疏表示。我们发现不同的种群具有不同的压缩比和分类精度。此外,我们分析了SNP数据的熵,它对跨种群压缩的影响,以及它与历史迁移的关系,我们展示了如何将自编码器引入现有的压缩管道。
{"title":"Autoencoders for genomic variation analysis.","authors":"Margarita Geleta, Daniel Mas Montserrat, Xavier Giro-I-Nieto, Alexander G Ioannidis","doi":"10.1101/gr.280086.124","DOIUrl":"10.1101/gr.280086.124","url":null,"abstract":"<p><p>Modern biobanks are providing numerous high-resolution genomic sequences of diverse populations. In order to account for diverse and admixed populations, new algorithmic tools are needed in order to properly capture the genetic composition of populations. Here, we explore deep learning techniques, namely, variational autoencoders (VAEs), to process genomic data from a population perspective. We show the power of VAEs for a variety of tasks relating to the interpretation, compression, classification, and simulation of genomic data with several worldwide whole genome data sets from both humans and canids, and evaluate the performance of the proposed applications with and without ancestry conditioning. The unsupervised setting of autoencoders allows for the detection and learning of granular population structure and inferring of informative latent factors. The learned latent spaces of VAEs are able to capture and represent differentiated Gaussian-like clusters of samples with similar genetic composition on a fine scale from single nucleotide polymorphisms (SNPs), enabling applications in dimensionality reduction and data simulation. These individual genotype sequences can then be decomposed into latent representations and reconstruction errors (residuals), which provide a sparse representation useful for lossless compression. We show that different populations have differentiated compression ratios and classification accuracies. Additionally, we analyze the entropy of the SNP data, its effect on compression across populations, and its relation to historical migrations, and we show how to introduce autoencoders into existing compression pipelines.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"348-360"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863191/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146010065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scSHEFT enables multiomics label transfer from scRNA-seq to scATAC-seq through dual alignment. 通过双比对,scSHEFT可以将多组学标签从scRNA-seq转移到scATAC-seq。
IF 5.5 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2026-02-03 DOI: 10.1101/gr.280410.125
Zhitao Huang, Ruiqing Zheng, Pengzhen Jia, Xuhua Yan, Jinmiao Chen, Min Li

Currently, with the emergence of abundant single-cell multiomics data, there is a trend where labels are transferred from well-annotated scRNA-seq data to less-annotated omics data, such as scATAC-seq. This approach leverages the gene expression profiles available in scRNA-seq to help annotate common cell types and even novel cell types for other omics data. However, the heterogeneous features between scRNA-seq and scATAC-seq pose challenges for identifying different cell types, which hinders the discovery of novel types. In this study, we propose a new label transfer tool scSHEFT, which simultaneously considers gene expression count data, peak count data, and Gene Activity Scores as inputs to bridge the gap of heterogeneous features. Specifically, we transform scATAC-seq data into Gene Activity Scores based on prior knowledge to harmonize heterogeneous features. As the feature transformation would result in information loss, we introduce the raw ATAC-seq embeddings to preserve the original information. To achieve a balance between interomics alignment and intraomics heterogeneity, we propose a dual alignment strategy. Specifically, scSHEFT employs an anchor-based approach to align interomics anchor pairs and a contrastive-based strategy to preserve cellular heterogeneity within each omics layer. Benchmarking scSHEFT against 11 state-of-the-art methods across seven data sets demonstrates its superiority in handling data sets of varying scales and technical noises.

目前,随着丰富的单细胞多组学数据的出现,有一种趋势是标签从注释良好的scRNA-seq数据转移到注释较少的组学数据,如scATAC-seq。这种方法利用scRNA-seq中可用的基因表达谱来帮助注释常见的细胞类型,甚至是其他组学数据的新细胞类型。然而,scRNA-seq和scATAC-seq之间的异质性特征给识别不同的细胞类型带来了挑战,这阻碍了新类型的发现。在本研究中,我们提出了一种新的标签转移工具scSHEFT,它同时考虑基因表达计数数据、峰值计数数据和基因活动评分作为输入,以弥合异构特征的差距。具体来说,我们将scATAC-seq数据转换为基于先验知识的基因活动评分,以协调异构特征。由于特征变换会导致信息丢失,我们引入原始的ATAC-seq嵌入来保留原始信息。为了实现组间一致性和组内异质性之间的平衡,我们提出了一种双重一致性策略。具体来说,scSHEFT采用基于锚定的方法来对齐组间锚定对,并采用基于对比的策略来保持每个组学层内的细胞异质性。对七个数据集的11种最先进的方法进行基准测试表明,scSHEFT在处理不同规模和技术噪声的数据集方面具有优势。
{"title":"scSHEFT enables multiomics label transfer from scRNA-seq to scATAC-seq through dual alignment.","authors":"Zhitao Huang, Ruiqing Zheng, Pengzhen Jia, Xuhua Yan, Jinmiao Chen, Min Li","doi":"10.1101/gr.280410.125","DOIUrl":"10.1101/gr.280410.125","url":null,"abstract":"<p><p>Currently, with the emergence of abundant single-cell multiomics data, there is a trend where labels are transferred from well-annotated scRNA-seq data to less-annotated omics data, such as scATAC-seq. This approach leverages the gene expression profiles available in scRNA-seq to help annotate common cell types and even novel cell types for other omics data. However, the heterogeneous features between scRNA-seq and scATAC-seq pose challenges for identifying different cell types, which hinders the discovery of novel types. In this study, we propose a new label transfer tool scSHEFT, which simultaneously considers gene expression count data, peak count data, and Gene Activity Scores as inputs to bridge the gap of heterogeneous features. Specifically, we transform scATAC-seq data into Gene Activity Scores based on prior knowledge to harmonize heterogeneous features. As the feature transformation would result in information loss, we introduce the raw ATAC-seq embeddings to preserve the original information. To achieve a balance between interomics alignment and intraomics heterogeneity, we propose a dual alignment strategy. Specifically, scSHEFT employs an anchor-based approach to align interomics anchor pairs and a contrastive-based strategy to preserve cellular heterogeneity within each omics layer. Benchmarking scSHEFT against 11 state-of-the-art methods across seven data sets demonstrates its superiority in handling data sets of varying scales and technical noises.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"387-396"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863186/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146010073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Genome research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1