首页 > 最新文献

NAR Genomics and Bioinformatics最新文献

英文 中文
A landscape of metallophore synthesis and uptake potential of the genus Staphylococcus. 葡萄球菌属的金属合成和吸收电位的景观。
IF 2.8 Q1 GENETICS & HEREDITY Pub Date : 2025-12-17 eCollection Date: 2025-12-01 DOI: 10.1093/nargab/lqaf183
Mathias Witte Paz, Alina Bitzer, Kay Nieselt, Simon Heilbronner

Metallophores are secondary metabolites that enable bacterial growth in metal-limited environments such as the human nasal microbiome. While synthesis and uptake of metallophores in Staphylococcus aureus are well characterized, the diversity across the Staphylococcus genus remains unclear. We performed a comprehensive bioinformatic analysis of 77 representative species, as well as over 1800 strains, to map metallophore biosynthetic gene clusters (BGCs) and uptake systems. Staphyloferrin A (SF-A) biosynthesis was widely conserved, though disrupted loci were found in some species, with some of them appearing to have replaced SF-A with a newly discovered, still uncharacterized, BGC. In contrast, staphyloferrin B and staphylopine production were restricted to select species. Uptake systems were more broadly distributed, showing evidence of "cheating" species that lack biosynthesis, but retain the required lipoproteins for metallophore usage. Staphylococcus lugdunensis exemplifies this, encoding multiple uptake systems without producing known metallophores. Strain-level variation was also observed, particularly with specific cases of SF-A truncation, but also for the diversity of lipoprotein receptors. These findings highlight the diversity of metallophore systems, suggesting diverse metallophore-dependent cooperation and competition within the Staphylococcus genus. This work provides a foundation for future experimental studies to identify the role of metallophores in microbial community interactions.

金属微粒是次级代谢物,使细菌能够在金属受限的环境中生长,如人类鼻腔微生物群。虽然金黄色葡萄球菌中金属蛋白的合成和摄取已被很好地表征,但葡萄球菌属的多样性仍不清楚。我们对77个代表性物种和1800多个菌株进行了全面的生物信息学分析,以绘制金属蛋白生物合成基因簇(BGCs)和摄取系统。葡萄球菌铁蛋白A (Staphyloferrin A, SF-A)的生物合成是广泛保守的,尽管在一些物种中发现了断裂的位点,其中一些位点似乎用新发现的尚未鉴定的BGC取代了SF-A。相比之下,葡萄铁蛋白B和葡萄碱的产量受限于特定种类。摄取系统分布更广泛,显示了缺乏生物合成的“欺骗”物种的证据,但保留了金属蛋白使用所需的脂蛋白。lugdunensis葡萄球菌就是一个例子,它编码多个摄取系统而不产生已知的金属细胞。还观察到菌株水平的变化,特别是在SF-A截断的特定情况下,但也观察到脂蛋白受体的多样性。这些发现突出了金属细胞系统的多样性,表明葡萄球菌属中金属细胞依赖的合作和竞争的多样性。本研究为进一步研究金属蛋白在微生物群落相互作用中的作用奠定了基础。
{"title":"A landscape of metallophore synthesis and uptake potential of the genus <i>Staphylococcus</i>.","authors":"Mathias Witte Paz, Alina Bitzer, Kay Nieselt, Simon Heilbronner","doi":"10.1093/nargab/lqaf183","DOIUrl":"10.1093/nargab/lqaf183","url":null,"abstract":"<p><p>Metallophores are secondary metabolites that enable bacterial growth in metal-limited environments such as the human nasal microbiome. While synthesis and uptake of metallophores in <i>Staphylococcus aureus</i> are well characterized, the diversity across the <i>Staphylococcus</i> genus remains unclear. We performed a comprehensive bioinformatic analysis of 77 representative species, as well as over 1800 strains, to map metallophore biosynthetic gene clusters (BGCs) and uptake systems. Staphyloferrin A (SF-A) biosynthesis was widely conserved, though disrupted loci were found in some species, with some of them appearing to have replaced SF-A with a newly discovered, still uncharacterized, BGC. In contrast, staphyloferrin B and staphylopine production were restricted to select species. Uptake systems were more broadly distributed, showing evidence of \"cheating\" species that lack biosynthesis, but retain the required lipoproteins for metallophore usage. <i>Staphylococcus lugdunensis</i> exemplifies this, encoding multiple uptake systems without producing known metallophores. Strain-level variation was also observed, particularly with specific cases of SF-A truncation, but also for the diversity of lipoprotein receptors. These findings highlight the diversity of metallophore systems, suggesting diverse metallophore-dependent cooperation and competition within the <i>Staphylococcus</i> genus. This work provides a foundation for future experimental studies to identify the role of metallophores in microbial community interactions.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 4","pages":"lqaf183"},"PeriodicalIF":2.8,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12709192/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145782945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint analysis of de novo mutations from autism spectrum disorder, schizophrenia, congenital heart disease, and other developmental disorders improves detection power and implicates shared molecular pathways and CNS processes. 联合分析来自自闭症谱系障碍、精神分裂症、先天性心脏病和其他发育障碍的新生突变可以提高检测能力,并涉及共享的分子途径和中枢神经系统过程。
IF 2.8 Q1 GENETICS & HEREDITY Pub Date : 2025-12-17 eCollection Date: 2025-12-01 DOI: 10.1093/nargab/lqaf162
Marc Kealhofer, Ruth Brown, Brien P Riley, Tan-Hoang Nguyen

Rare exonic variant studies have previously implicated overlapping risk genes and pathways for autism spectrum disorder (ASD), severe, undiagnosed developmental disorders (UDDs), intellectual disability (ID), congenital heart disease (CHD), and schizophrenia (SCZ). Here, we use a two-trait Bayesian integrative analysis approach on 43 287 ASD, UDD/ID, CHD, and SCZ case trios to increase statistical power for gene discovery and to identify shared risk genes. At a posterior probability > 0.80, we identified 180 candidate risk genes for ASD, 315 for UDD/ID, 49 for CHD, and 47 for SCZ, including genes not previously reported, and also detected shared risk genes in pair-wise analyses. Gene set enrichment analysis of the ASD-UDD/ID, ASD-SCZ, and UDD/ID-SCZ shared risk genes overwhelmingly implicated gene sets associated with the synapse and epigenetic modification, while CHD-ASD shared risk genes were enriched in cell cycle phase transition gene sets, and CHD-UDD/ID shared risk genes implicated cardiac development. ASD-UDD/ID risk genes had elevated expression in interneurons and pyramidal cells, while ASD-UDD/ID and CHD-UDD/ID shared risk genes showed elevated connectivity in protein-protein interaction networks. Leveraging information across disorders with genetic overlap, both to increase power for candidate risk gene discovery and also as a method to elucidate shared genetic mechanisms.

罕见的外显子变异研究先前涉及自闭症谱系障碍(ASD)、严重未确诊的发育障碍(UDDs)、智力残疾(ID)、先天性心脏病(CHD)和精神分裂症(SCZ)的重叠风险基因和途径。本研究采用双特征贝叶斯综合分析方法对43 287例ASD、UDD/ID、CHD和SCZ病例进行分析,以提高基因发现的统计能力,并确定共同的风险基因。在后验概率为bb0.80的情况下,我们确定了180个ASD候选风险基因,315个UDD/ID候选风险基因,49个CHD候选风险基因,47个SCZ候选风险基因,包括之前未报道的基因,并在配对分析中发现了共享风险基因。基因集富集分析表明,ASD-UDD/ID、ASD-SCZ和UDD/ID- scz共享风险基因绝大多数涉及突触和表观遗传修饰相关的基因集,而冠心病- asd共享风险基因富集于细胞周期相变基因集,冠心病-UDD/ID共享风险基因涉及心脏发育。ASD-UDD/ID风险基因在中间神经元和锥体细胞中表达升高,而ASD-UDD/ID和CHD-UDD/ID共享风险基因在蛋白-蛋白相互作用网络中的连通性升高。利用遗传重叠疾病的信息,既增加候选风险基因发现的能力,也作为阐明共享遗传机制的方法。
{"title":"Joint analysis of <i>de novo</i> mutations from autism spectrum disorder, schizophrenia, congenital heart disease, and other developmental disorders improves detection power and implicates shared molecular pathways and CNS processes.","authors":"Marc Kealhofer, Ruth Brown, Brien P Riley, Tan-Hoang Nguyen","doi":"10.1093/nargab/lqaf162","DOIUrl":"10.1093/nargab/lqaf162","url":null,"abstract":"<p><p>Rare exonic variant studies have previously implicated overlapping risk genes and pathways for autism spectrum disorder (ASD), severe, undiagnosed developmental disorders (UDDs), intellectual disability (ID), congenital heart disease (CHD), and schizophrenia (SCZ). Here, we use a two-trait Bayesian integrative analysis approach on 43 287 ASD, UDD/ID, CHD, and SCZ case trios to increase statistical power for gene discovery and to identify shared risk genes. At a posterior probability > 0.80, we identified 180 candidate risk genes for ASD, 315 for UDD/ID, 49 for CHD, and 47 for SCZ, including genes not previously reported, and also detected shared risk genes in pair-wise analyses. Gene set enrichment analysis of the ASD-UDD/ID, ASD-SCZ, and UDD/ID-SCZ shared risk genes overwhelmingly implicated gene sets associated with the synapse and epigenetic modification, while CHD-ASD shared risk genes were enriched in cell cycle phase transition gene sets, and CHD-UDD/ID shared risk genes implicated cardiac development. ASD-UDD/ID risk genes had elevated expression in interneurons and pyramidal cells, while ASD-UDD/ID and CHD-UDD/ID shared risk genes showed elevated connectivity in protein-protein interaction networks. Leveraging information across disorders with genetic overlap, both to increase power for candidate risk gene discovery and also as a method to elucidate shared genetic mechanisms.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 4","pages":"lqaf162"},"PeriodicalIF":2.8,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12709184/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145783037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CORGIAS: identifying correlated gene pairs by considering evolutionary history in a large-scale prokaryotic genome dataset. CORGIAS:通过考虑大规模原核生物基因组数据集的进化史来识别相关基因对。
IF 2.8 Q1 GENETICS & HEREDITY Pub Date : 2025-12-12 eCollection Date: 2025-12-01 DOI: 10.1093/nargab/lqaf182
Yuki Nishimura, Kimiho Omae, Kento Tominaga, Wataru Iwasaki

The recent expansion of prokaryotic genomes reveals many ortholog groups (OGs) whose function cannot be inferred from conventional, sequence similarity-based annotation methods, especially in metagenome-assembled genomes. Phylogenetic profiling is one of the promising methods to annotate these OGs, by identifying functional relationships of OGs using co- or anti-occurrence of OG distributions, not sequence similarity. Here, we proposed two new phylogenetic methods for large-scale data, Ancestral State Adjustment (ASA) and Simultaneous EVolution test (SEV), which consider the ancestral state of OG presence/absence. In evaluations using three distinct prokaryotic datasets, ASA and SEV showed better or comparable performance to both established and recently proposed methods for large-scale data. We compared the functionally related OGs detected by each method and found that SEV and its predecessor can identify slowly evolving OGs, such as housekeeping genes. In contrast, ASA and its predecessors can detect functionally related OGs that tend to be gained or lost in a fixed order, indicating a strong evolutionary constraint that provides clues for functional prediction. Using matrix multiplication, we also showed that SEV is scalable in the latest genome databases.

近年来原核生物基因组的扩展揭示了许多同源群(OGs),其功能无法从传统的基于序列相似性的注释方法推断,特别是在宏基因组组装的基因组中。系统发育分析是一种很有前途的方法来注释这些OG,通过OG分布的共发生或反发生来识别OG的功能关系,而不是序列相似性。在此,我们提出了两种新的系统发育方法,即祖先状态调整(ASA)和同步进化测试(SEV),它们考虑了OG存在/不存在的祖先状态。在使用三种不同的原核数据集的评估中,ASA和SEV在大规模数据中表现出比已建立的和最近提出的方法更好或相当的性能。我们比较了每种方法检测到的功能相关的OGs,发现SEV及其前身可以识别缓慢进化的OGs,如管家基因。相比之下,ASA及其前身可以检测出与功能相关的og,这些og倾向于以固定的顺序获得或丢失,这表明有很强的进化约束,为功能预测提供了线索。利用矩阵乘法,我们还证明了SEV在最新的基因组数据库中是可扩展的。
{"title":"CORGIAS: identifying correlated gene pairs by considering evolutionary history in a large-scale prokaryotic genome dataset.","authors":"Yuki Nishimura, Kimiho Omae, Kento Tominaga, Wataru Iwasaki","doi":"10.1093/nargab/lqaf182","DOIUrl":"10.1093/nargab/lqaf182","url":null,"abstract":"<p><p>The recent expansion of prokaryotic genomes reveals many ortholog groups (OGs) whose function cannot be inferred from conventional, sequence similarity-based annotation methods, especially in metagenome-assembled genomes. Phylogenetic profiling is one of the promising methods to annotate these OGs, by identifying functional relationships of OGs using co- or anti-occurrence of OG distributions, not sequence similarity. Here, we proposed two new phylogenetic methods for large-scale data, Ancestral State Adjustment (ASA) and Simultaneous EVolution test (SEV), which consider the ancestral state of OG presence/absence. In evaluations using three distinct prokaryotic datasets, ASA and SEV showed better or comparable performance to both established and recently proposed methods for large-scale data. We compared the functionally related OGs detected by each method and found that SEV and its predecessor can identify slowly evolving OGs, such as housekeeping genes. In contrast, ASA and its predecessors can detect functionally related OGs that tend to be gained or lost in a fixed order, indicating a strong evolutionary constraint that provides clues for functional prediction. Using matrix multiplication, we also showed that SEV is scalable in the latest genome databases.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 4","pages":"lqaf182"},"PeriodicalIF":2.8,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12699329/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145757792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RNA-binding proteins connect Exon usage to the chromatin. rna结合蛋白将外显子的使用与染色质连接起来。
IF 2.8 Q1 GENETICS & HEREDITY Pub Date : 2025-12-10 eCollection Date: 2025-12-01 DOI: 10.1093/nargab/lqaf161
Hanah Robertson, Hoang T T Do, Volkhard Helms

Exonic enrichment of histone marks hints at their role in regulating alternative splicing. This study aims to connect the transcriptome and epigenome in the context of splicing outcomes in embryonic cell lines. The tools rMATS and MANorm were used to obtain estimates of differential inclusion of exons and differential enrichment of epigenetic signals, respectively. Two classes of alternative exons were identified in embryonic cell lines: those differentially co-occurring with at least one mark among H3K27ac, H3K27me3, H3K36me3, H3K9me3, and H3K4me3, and those marked by neither of these marks. Binary classifiers were trained using RNA-binding protein (RBP) binding affinities on the flanking regions of these exons. This resulted in a set of RBPs, whose putative binding was predicted to associate local chromatin modification marking an exon with its differential inclusion, some of which have been experimentally shown to interact with histone mark reader proteins. We speculate that sequence signals harbored at exon-intron flanks regulate differential splicing of exons, marked by at least one of the five epigenetic signatures. Finally, eCLIP data from ENCODE for the HepG2 and K562 cell lines support TIA1 and U2AF2 as potential episplicing RBPs, as predicted by our model in the embryonic cell lines.

组蛋白标记的外显子富集提示它们在调节选择性剪接中的作用。本研究旨在连接胚胎细胞系剪接结果背景下的转录组和表观基因组。使用rMATS和MANorm工具分别获得外显子的差异包含和表观遗传信号的差异富集的估计。在胚胎细胞系中发现了两类可选择的外显子:在H3K27ac、H3K27me3、H3K36me3、H3K9me3和H3K4me3中至少有一个标记的外显子,以及不标记这些标记的外显子。使用这些外显子侧翼区域的rna结合蛋白(RBP)结合亲和力来训练二元分类器。这导致了一组rbp,其假定的结合被预测将局部染色质修饰标记外显子与其差异包涵体联系起来,其中一些已被实验证明与组蛋白标记读取器蛋白相互作用。我们推测,在外显子-内含子两侧的序列信号调节了外显子的差异剪接,这至少被五种表观遗传特征中的一种所标记。最后,来自ENCODE的HepG2和K562细胞系的eCLIP数据支持TIA1和U2AF2是潜在的表观剪接rbp,正如我们在胚胎细胞系中的模型所预测的那样。
{"title":"RNA-binding proteins connect Exon usage to the chromatin.","authors":"Hanah Robertson, Hoang T T Do, Volkhard Helms","doi":"10.1093/nargab/lqaf161","DOIUrl":"10.1093/nargab/lqaf161","url":null,"abstract":"<p><p>Exonic enrichment of histone marks hints at their role in regulating alternative splicing. This study aims to connect the transcriptome and epigenome in the context of splicing outcomes in embryonic cell lines. The tools rMATS and MANorm were used to obtain estimates of differential inclusion of exons and differential enrichment of epigenetic signals, respectively. Two classes of alternative exons were identified in embryonic cell lines: those differentially co-occurring with at least one mark among H3K27ac, H3K27me3, H3K36me3, H3K9me3, and H3K4me3, and those marked by neither of these marks. Binary classifiers were trained using RNA-binding protein (RBP) binding affinities on the flanking regions of these exons. This resulted in a set of RBPs, whose putative binding was predicted to associate local chromatin modification marking an exon with its differential inclusion, some of which have been experimentally shown to interact with histone mark reader proteins. We speculate that sequence signals harbored at exon-intron flanks regulate differential splicing of exons, marked by at least one of the five epigenetic signatures. Finally, eCLIP data from ENCODE for the HepG2 and K562 cell lines support TIA1 and U2AF2 as potential episplicing RBPs, as predicted by our model in the embryonic cell lines.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 4","pages":"lqaf161"},"PeriodicalIF":2.8,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12693533/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145744686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genomic data representations for horizontal gene transfer detection. 水平基因转移检测的基因组数据表示。
IF 2.8 Q1 GENETICS & HEREDITY Pub Date : 2025-12-10 eCollection Date: 2025-12-01 DOI: 10.1093/nargab/lqaf165
Andre Jatmiko Wijaya, Aleksandar Anžel, Hugues Richard, Georges Hattab

Horizontal gene transfer (HGT) accelerates the spread of antimicrobial resistance (AMR) via mobile genetic elements allowing pathogens to acquire resistance genes across species. This process drives the evolution of multidrug-resistant "superbugs" in clinical settings. Detection of HGT is critical to mitigating AMR, but traditional methods based on sequence assembly or comparative genomics lack resolution for complex transfer events. While machine learning (ML) promises improved detection, several studies in other domains have demonstrated that data representations will strongly influence its performance. There is, however, no clear recommendation on the best data representation for HGT detection. Here, we evaluated 44 genomic data representations using five ML models across four data sets. We demonstrate that ML performance is highly dependent on the genomic data representation. The RCKmer-based representation (k = 7) paired with a support vector machine is found to be optimal (F1: 0.959; MCC: 0.908), outperforming other approaches. Moreover, models trained on multi-species data sets are shown to generalize better. Our findings suggest that genomic surveillance benefits from task-specific genome data representations. This work provides state-of-the-art, fine-tuned models for identifying and annotating genomic islands that will enable proper detection of transfer of AMR-related genes between species.

水平基因转移(HGT)通过可移动的遗传元件加速了抗菌素耐药性(AMR)的传播,使病原体能够跨物种获得耐药基因。这一过程推动了临床环境中耐多药“超级细菌”的进化。HGT的检测对于减轻抗菌素耐药性至关重要,但传统的基于序列组装或比较基因组学的方法缺乏对复杂转移事件的分辨率。虽然机器学习(ML)有望改善检测,但其他领域的一些研究表明,数据表示将强烈影响其性能。然而,对于HGT检测的最佳数据表示并没有明确的建议。在这里,我们使用4个数据集的5个ML模型评估了44个基因组数据表示。我们证明了机器学习的性能高度依赖于基因组数据的表示。发现基于rckmer的表示(k = 7)与支持向量机配对是最优的(F1: 0.959; MCC: 0.908),优于其他方法。此外,在多物种数据集上训练的模型具有更好的泛化能力。我们的研究结果表明,基因组监测受益于特定任务的基因组数据表示。这项工作为鉴定和注释基因组岛提供了最先进的、微调的模型,这将有助于正确检测物种之间抗菌素耐药性相关基因的转移。
{"title":"Genomic data representations for horizontal gene transfer detection.","authors":"Andre Jatmiko Wijaya, Aleksandar Anžel, Hugues Richard, Georges Hattab","doi":"10.1093/nargab/lqaf165","DOIUrl":"10.1093/nargab/lqaf165","url":null,"abstract":"<p><p>Horizontal gene transfer (HGT) accelerates the spread of antimicrobial resistance (AMR) via mobile genetic elements allowing pathogens to acquire resistance genes across species. This process drives the evolution of multidrug-resistant \"superbugs\" in clinical settings. Detection of HGT is critical to mitigating AMR, but traditional methods based on sequence assembly or comparative genomics lack resolution for complex transfer events. While machine learning (ML) promises improved detection, several studies in other domains have demonstrated that data representations will strongly influence its performance. There is, however, no clear recommendation on the best data representation for HGT detection. Here, we evaluated 44 genomic data representations using five ML models across four data sets. We demonstrate that ML performance is highly dependent on the genomic data representation. The RCKmer-based representation (<i>k</i> = 7) paired with a support vector machine is found to be optimal (F1: 0.959; MCC: 0.908), outperforming other approaches. Moreover, models trained on multi-species data sets are shown to generalize better. Our findings suggest that genomic surveillance benefits from task-specific genome data representations. This work provides state-of-the-art, fine-tuned models for identifying and annotating genomic islands that will enable proper detection of transfer of AMR-related genes between species.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 4","pages":"lqaf165"},"PeriodicalIF":2.8,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12693543/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145744729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genome assemblies of Indian desi cattle reveal hotspots of rearrangements and immune-related genetic diversity. 印度德西牛基因组组装揭示了重排和免疫相关遗传多样性的热点。
IF 2.8 Q1 GENETICS & HEREDITY Pub Date : 2025-12-10 eCollection Date: 2025-12-01 DOI: 10.1093/nargab/lqaf153
Sarwar Azam, Abhisek Sahu, Mohammad Kadivella, Aamir Waseem Khan, Mahesh Neupane, Curtis P Van Tassell, Benjamin D Rosen, Ravi Kumar Gandham, Subha Narayan Rath, Subeer S Majumdar

India, home to the world's largest cattle population, hosts native dairy breeds essential to its agricultural economy because of their adaptability and resilience. This study characterizes the genomes of five prominent breeds-Gir, Kankrej, Red Sindhi, Sahiwal, and Tharparkar-highlighting their unique genomic characteristics. The de novo assemblies ranged from 2.70 to 2.78 Gb in size, with 90% of the genomes assembled in just 56 to 1663 scaffolds. The use of reference-guided scaffolding further enhanced these genomes, resulting in 93.3%-96.7% pseudomolecule coverage with strong BUSCO scores (94.1%-95.5%). Comparative analyses revealed 87%-95% synteny with the Brahman genome and identified 19.84-153.16 Mb of structural rearrangements per genome, including inversions, translocations, and duplications. Synteny diversity analysis uncovered 10 643 perfectly collinear regions spanning 87.3 Mb and 6622 hotspots of rearrangement (HOT regions) covering 55.18 Mb. These HOT regions, characterized by high synteny diversity, were significantly enriched with immune-related genes. Moreover, immune-related gene clusters, including major histocompatibility complex, natural killer complex, and leukocyte receptor complex, were identified within HOT regions in the desi reference genome. Our findings provide valuable insights into the genetic diversity of desi cattle breeds. The high-quality genome assemblies generated in this study will serve as valuable resources for future research in genetic improvement, disease resistance, and environmental adaptation.

印度拥有世界上最大的牛群,拥有对其农业经济至关重要的本土奶牛品种,因为它们具有适应性和适应力。本研究分析了5个主要品种——吉尔、坎克雷吉、红辛迪、萨希瓦尔和塔帕克的基因组特征,突出了它们独特的基因组特征。重新组装的基因组大小从2.70到2.78 Gb不等,其中90%的基因组仅在56到1663个支架中组装。参考引导脚手架的使用进一步增强了这些基因组,获得了93.3%-96.7%的伪分子覆盖率,BUSCO评分(94.1%-95.5%)很高。对比分析显示,该基因与婆罗门基因组的同源性为87% ~ 95%,每个基因组有19.84 ~ 153.16 Mb的结构重排,包括倒置、易位和重复。共线区10 643个,共线区87.3 Mb;重排热点区6622个,共线区55.18 Mb。这些热点区具有较高的共线多样性,显著富集免疫相关基因。此外,免疫相关的基因簇,包括主要组织相容性复合体、自然杀伤复合体和白细胞受体复合体,在desi参考基因组的HOT区域被鉴定出来。我们的发现为了解desi牛品种的遗传多样性提供了有价值的见解。本研究产生的高质量基因组组合将为未来遗传改良、抗病和环境适应研究提供宝贵资源。
{"title":"Genome assemblies of Indian <i>desi</i> cattle reveal hotspots of rearrangements and immune-related genetic diversity.","authors":"Sarwar Azam, Abhisek Sahu, Mohammad Kadivella, Aamir Waseem Khan, Mahesh Neupane, Curtis P Van Tassell, Benjamin D Rosen, Ravi Kumar Gandham, Subha Narayan Rath, Subeer S Majumdar","doi":"10.1093/nargab/lqaf153","DOIUrl":"10.1093/nargab/lqaf153","url":null,"abstract":"<p><p>India, home to the world's largest cattle population, hosts native dairy breeds essential to its agricultural economy because of their adaptability and resilience. This study characterizes the genomes of five prominent breeds-Gir, Kankrej, Red Sindhi, Sahiwal, and Tharparkar-highlighting their unique genomic characteristics. The <i>de novo</i> assemblies ranged from 2.70 to 2.78 Gb in size, with 90% of the genomes assembled in just 56 to 1663 scaffolds. The use of reference-guided scaffolding further enhanced these genomes, resulting in 93.3%-96.7% pseudomolecule coverage with strong BUSCO scores (94.1%-95.5%). Comparative analyses revealed 87%-95% synteny with the Brahman genome and identified 19.84-153.16 Mb of structural rearrangements per genome, including inversions, translocations, and duplications. Synteny diversity analysis uncovered 10 643 perfectly collinear regions spanning 87.3 Mb and 6622 hotspots of rearrangement (HOT regions) covering 55.18 Mb. These HOT regions, characterized by high synteny diversity, were significantly enriched with immune-related genes. Moreover, immune-related gene clusters, including major histocompatibility complex, natural killer complex, and leukocyte receptor complex, were identified within HOT regions in the <i>desi</i> reference genome. Our findings provide valuable insights into the genetic diversity of <i>desi</i> cattle breeds. The high-quality genome assemblies generated in this study will serve as valuable resources for future research in genetic improvement, disease resistance, and environmental adaptation.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 4","pages":"lqaf153"},"PeriodicalIF":2.8,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12693530/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145744401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HaploExplore, a software specifically designed for the detection of minor allele (MiA-) haploblocks. HaploExplore是一款专门用于检测次要等位基因(MiA-)单基因块的软件。
IF 2.8 Q1 GENETICS & HEREDITY Pub Date : 2025-12-10 eCollection Date: 2025-12-01 DOI: 10.1093/nargab/lqaf186
Matilde Manetti, Samuel Hiet, Myriam Rahmouni, Jean-Louis Spadoni, Alice Dobiecki, Marco Lamanda, Maxime Tison, Taoufik Labib, Cristina Giuliani, Sigrid Le Clerc, Jean-François Deleuze, Jean-François Zagury

Haplotype blocks in the genome are informative of evolutionary processes and they play a pivotal role in describing the genomic variability across human populations and susceptibility/resistance to diseases. Several software have been developed for haplotype block detection, but they do not distinguish between the impacts of major and minor single nucleotides polymorphism (SNP) alleles. In this study, we present a powerful haploblock detection software, specifically designed for identifying haploblocks associated with SNP minor allele haploblocks (MiA-haploblocks). These haploblocks are particularly important as they can significantly influence phenotypic traits, offering a novel approach for studying genetic associations and complex traits. HaploExplore operates on VCF files containing phased data, exhibiting rapid processing times, and generating user-friendly outputs. Results converge when analyzing populations of 100 individuals or more. A comparative analysis of HaploExplore against other haploblock detection software revealed its superiority in terms of either simplicity, flexibility, or speed, with the unique capability to target minor alleles. HaploExplore will be very useful for evolutionary genomics and for GWAS analysis in human diseases, given that the effects of genetic associations may accumulate within a specific haploblock.

基因组中的单倍型块是进化过程的信息,它们在描述人类群体的基因组变异性和对疾病的易感性/抗性方面起着关键作用。已经开发了一些软件用于单倍型块检测,但它们不能区分主要和次要单核苷酸多态性(SNP)等位基因的影响。在这项研究中,我们提出了一个功能强大的单倍体检测软件,专门用于识别与SNP小等位基因单倍体相关的单倍体(MiA-haploblocks)。这些单倍体尤其重要,因为它们可以显著影响表型性状,为研究遗传关联和复杂性状提供了一种新的方法。HaploExplore在包含阶段性数据的VCF文件上操作,显示快速的处理时间,并生成用户友好的输出。当分析100个或更多个体的种群时,结果趋于一致。HaploExplore与其他单倍体块检测软件的对比分析显示其在简单性、灵活性或速度方面的优势,具有针对次要等位基因的独特能力。HaploExplore将对进化基因组学和人类疾病的GWAS分析非常有用,因为遗传关联的影响可能在特定的单倍体块中积累。
{"title":"HaploExplore, a software specifically designed for the detection of minor allele (MiA-) haploblocks.","authors":"Matilde Manetti, Samuel Hiet, Myriam Rahmouni, Jean-Louis Spadoni, Alice Dobiecki, Marco Lamanda, Maxime Tison, Taoufik Labib, Cristina Giuliani, Sigrid Le Clerc, Jean-François Deleuze, Jean-François Zagury","doi":"10.1093/nargab/lqaf186","DOIUrl":"10.1093/nargab/lqaf186","url":null,"abstract":"<p><p>Haplotype blocks in the genome are informative of evolutionary processes and they play a pivotal role in describing the genomic variability across human populations and susceptibility/resistance to diseases. Several software have been developed for haplotype block detection, but they do not distinguish between the impacts of major and minor single nucleotides polymorphism (SNP) alleles. In this study, we present a powerful haploblock detection software, specifically designed for identifying haploblocks associated with SNP minor allele haploblocks (MiA-haploblocks). These haploblocks are particularly important as they can significantly influence phenotypic traits, offering a novel approach for studying genetic associations and complex traits. HaploExplore operates on VCF files containing phased data, exhibiting rapid processing times, and generating user-friendly outputs. Results converge when analyzing populations of 100 individuals or more. A comparative analysis of HaploExplore against other haploblock detection software revealed its superiority in terms of either simplicity, flexibility, or speed, with the unique capability to target minor alleles. HaploExplore will be very useful for evolutionary genomics and for GWAS analysis in human diseases, given that the effects of genetic associations may accumulate within a specific haploblock.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 4","pages":"lqaf186"},"PeriodicalIF":2.8,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12693498/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145744731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SAFARI: pangenome alignment of ancient DNA using purine/pyrimidine encodings. SAFARI:使用嘌呤/嘧啶编码对古代DNA进行泛基因组比对。
IF 2.8 Q1 GENETICS & HEREDITY Pub Date : 2025-12-10 eCollection Date: 2025-12-01 DOI: 10.1093/nargab/lqaf170
Joshua Rubin, Jan van Waaij, Louis Kraft, Jouni Sirén, Peter Wad Sackett, Gabriel Renaud

Aligning DNA sequences retrieved from fossils or other paleontological artifacts, referred to as ancient DNA (aDNA), is particularly challenging due to the short sequence length and chemical damage which creates a specific pattern of substitution (C[Formula: see text]T and G[Formula: see text]A) in addition to the heightened divergence between the sample and the reference genome thus exacerbating reference bias. This bias can be mitigated by aligning to pangenome graphs to incorporate documented organismic variation, but this approach still suffers from substitution patterns due to chemical damage. We introduce a novel methodology introducing the RYmer index, a variant of the commonly used minimizer index which represents purines (A,G) and pyrimidines (C,T) as R and Y, respectively. This creates an indexing scheme robust to the aforementioned chemical damage. We implemented SAFARI (Sensitive Alignments From A RYmer Index), an aDNA damage-aware version of the pangenome aligner vg giraffe, which uses RYmers to rescue alignments containing deaminated seeds. For highly damaged samples, the recovery rate could be upwards of 10%, an amount which could well affect downstream results. We show that our approach produces more correct alignments from aDNA sequences than current approaches while maintaining a tolerable rate of spurious alignments. In addition, we demonstrate that our algorithm improves the estimate of the rate of aDNA damage, especially for highly damaged samples. Crucially, we show that this improved alignment can directly translate into better insights gained from the data by showcasing its integration with a number of extant pangenome tools.

比对从化石或其他古生物文物中提取的DNA序列(称为古DNA (aDNA))尤其具有挑战性,因为序列长度较短,化学损伤会产生特定的替代模式(C[公式:见文本]T和G[公式:见文本]a),此外样本和参考基因组之间的差异也会加剧参考偏差。这种偏差可以通过调整泛基因组图来减轻,以纳入记录的生物变异,但这种方法仍然受到化学损伤的替代模式的影响。我们介绍了一种新的方法,引入了RYmer指数,这是一种常用的最小化指数的变体,它分别将嘌呤(a,G)和嘧啶(C,T)表示为R和Y。这创建了一个对上述化学损害具有鲁棒性的索引方案。我们实现了SAFARI(来自赖默索引的敏感比对),这是一个aDNA损伤感知版本的全基因组比对器vg长颈鹿,它使用赖默来拯救含有脱胺种子的比对。对于高度损坏的样品,回收率可达10%以上,这一数量可能会对下游结果产生很大影响。我们表明,我们的方法比目前的方法从aDNA序列中产生更多的正确比对,同时保持可容忍的虚假比对率。此外,我们证明了我们的算法提高了aDNA损伤率的估计,特别是对于高度损伤的样本。至关重要的是,我们表明,通过展示其与许多现有泛基因组工具的集成,这种改进的校准可以直接转化为从数据中获得的更好的见解。
{"title":"SAFARI: pangenome alignment of ancient DNA using purine/pyrimidine encodings.","authors":"Joshua Rubin, Jan van Waaij, Louis Kraft, Jouni Sirén, Peter Wad Sackett, Gabriel Renaud","doi":"10.1093/nargab/lqaf170","DOIUrl":"10.1093/nargab/lqaf170","url":null,"abstract":"<p><p>Aligning DNA sequences retrieved from fossils or other paleontological artifacts, referred to as ancient DNA (aDNA), is particularly challenging due to the short sequence length and chemical damage which creates a specific pattern of substitution (C[Formula: see text]T and G[Formula: see text]A) in addition to the heightened divergence between the sample and the reference genome thus exacerbating reference bias. This bias can be mitigated by aligning to pangenome graphs to incorporate documented organismic variation, but this approach still suffers from substitution patterns due to chemical damage. We introduce a novel methodology introducing the RYmer index, a variant of the commonly used minimizer index which represents purines (A,G) and pyrimidines (C,T) as R and Y, respectively. This creates an indexing scheme robust to the aforementioned chemical damage. We implemented SAFARI (Sensitive Alignments From A RYmer Index), an aDNA damage-aware version of the pangenome aligner vg giraffe, which uses RYmers to rescue alignments containing deaminated seeds. For highly damaged samples, the recovery rate could be upwards of 10%, an amount which could well affect downstream results. We show that our approach produces more correct alignments from aDNA sequences than current approaches while maintaining a tolerable rate of spurious alignments. In addition, we demonstrate that our algorithm improves the estimate of the rate of aDNA damage, especially for highly damaged samples. Crucially, we show that this improved alignment can directly translate into better insights gained from the data by showcasing its integration with a number of extant pangenome tools.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 4","pages":"lqaf170"},"PeriodicalIF":2.8,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12693626/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145744782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genome sequence assembly and annotation of MATA and MATB strains of Yarrowia lipolytica. 脂性耶氏菌MATA和MATB株基因组序列组装与注释。
IF 2.8 Q1 GENETICS & HEREDITY Pub Date : 2025-12-10 eCollection Date: 2025-12-01 DOI: 10.1093/nargab/lqaf175
Narges Zali, Osama El Demerdash, Kapeel Chougule, Zhenyuan Lu, Doreen Ware, Bruce Stillman

Yeast is commonly utilized in molecular and cell biology research, and Yarrowia lipolytica is favored by bioengineers due to its ability to produce copious amounts of lipids, chemicals, and enzymes for industrial applications. Y. lipolytica is a dimorphic yeast that can proliferate in aerobic and hydrophobic environments conducive to industrial use. However, there is limited knowledge about the basic molecular biology of this yeast, including how the genome is duplicated and how gene silencing occurs. Genome sequences of Y. lipolytica strains have offered insights into this yeast species and have facilitated the development of new industrial applications. Although previous studies have reported the genome sequence of a few Y. lipolytica strains, it is of value to have more precise sequences and annotation, particularly for studies of the biology of this yeast. To further study and characterize the molecular biology of this microorganism, a high-quality reference genome assembly and annotation has been produced for two related Y. lipolytica strains of the opposite mating type, strain E122 (MATA) and 22301-5 (MATB). The combination of short-read and long-read sequencing of genome DNA and short-read and long-read sequencing of transcript cDNAs allowed the genome assembly and a comparison with a distantly related Yarrowia strain.

酵母通常用于分子和细胞生物学研究,而脂溶耶氏杆菌因其能够生产大量的脂质、化学物质和酶而受到生物工程师的青睐。聚脂酵母是一种二态酵母,可以在有氧和疏水环境中增殖,有利于工业应用。然而,关于这种酵母的基本分子生物学知识有限,包括基因组如何复制以及基因沉默如何发生。多脂酵母菌菌株的基因组序列提供了对该酵母菌物种的深入了解,并促进了新的工业应用的发展。虽然以前的研究已经报道了一些多脂酵母菌的基因组序列,但更精确的序列和注释,特别是对该酵母菌的生物学研究具有重要的价值。为了进一步研究和表征该微生物的分子生物学特性,我们对两种相关的脂质体Y. polytica菌株E122 (MATA)和22301-5 (MATB)进行了高质量的参考基因组组装和注释。基因组DNA的短读和长读测序以及转录物cdna的短读和长读测序的组合允许基因组组装并与远亲耶氏菌菌株进行比较。
{"title":"Genome sequence assembly and annotation of <i>MATA</i> and <i>MATB</i> strains of <i>Yarrowia lipolytica</i>.","authors":"Narges Zali, Osama El Demerdash, Kapeel Chougule, Zhenyuan Lu, Doreen Ware, Bruce Stillman","doi":"10.1093/nargab/lqaf175","DOIUrl":"10.1093/nargab/lqaf175","url":null,"abstract":"<p><p>Yeast is commonly utilized in molecular and cell biology research, and <i>Yarrowia lipolytica</i> is favored by bioengineers due to its ability to produce copious amounts of lipids, chemicals, and enzymes for industrial applications. <i>Y. lipolytica</i> is a dimorphic yeast that can proliferate in aerobic and hydrophobic environments conducive to industrial use. However, there is limited knowledge about the basic molecular biology of this yeast, including how the genome is duplicated and how gene silencing occurs. Genome sequences of <i>Y. lipolytica</i> strains have offered insights into this yeast species and have facilitated the development of new industrial applications. Although previous studies have reported the genome sequence of a few <i>Y. lipolytica</i> strains, it is of value to have more precise sequences and annotation, particularly for studies of the biology of this yeast. To further study and characterize the molecular biology of this microorganism, a high-quality reference genome assembly and annotation has been produced for two related <i>Y. lipolytica</i> strains of the opposite mating type, strain E122 (<i>MATA</i>) and 22301-5 (<i>MATB</i>). The combination of short-read and long-read sequencing of genome DNA and short-read and long-read sequencing of transcript cDNAs allowed the genome assembly and a comparison with a distantly related <i>Yarrowia</i> strain.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 4","pages":"lqaf175"},"PeriodicalIF":2.8,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12693490/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145744355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Removal of unwanted variation in pseudobulk analysis of single-cell RNA sequencing data and the leveraging of pseudoreplicates. 去除单细胞RNA测序数据的假体分析中不需要的变异和假复制的利用。
IF 2.8 Q1 GENETICS & HEREDITY Pub Date : 2025-12-08 eCollection Date: 2025-12-01 DOI: 10.1093/nargab/lqaf179
Sofía Prieto León, Ewoud De Troyer, Helena Geys, Koen Van den Berge, Olivier Thas

Removing unwanted variation (RUV) is key for accurate biological interpretation in high-throughput sequencing studies. However, no standardized approach exists for pseudobulked single-cell RNA-sequencing (scRNA-seq) data. Improper implementation of RUV methods may remove biological information, jeopardizing power and false positive control in differential expression analysis. We evaluate the impact of three implementation strategies ('trails') in three RUV methods (RUV2, RUVIII, RUV4) using simulated and real biological signals in pseudobulked scRNA-seq data. Effects of technical noise under confounding and model misspecification conditions are also considered. Additionally, we introduce a novel strategy, RUVIII PBPS, to remove unwanted variation in pseudobulk differential expression analyses with insufficient technical replicates or negative control genes. Our analysis demonstrates that removing unwanted variation per cell type with RUV2 or RUVIII extracts factors associated with technical noise and controls the false discovery rate (FDR), even in the presence of confounding. RUVIII PBPS successfully controls the FDR when other standard RUV methods cannot be used due to missing technical replicates, dependence between the factor of interest and the sources of unwanted variation, and lack of plausible negative control genes.

去除不需要的变异(RUV)是高通量测序研究中准确生物学解释的关键。然而,对于伪大体积单细胞rna测序(scRNA-seq)数据,还没有标准化的方法。RUV方法实施不当,可能会去除生物信息,危及差异表达分析的有效性和假阳性控制。我们利用伪大容量scRNA-seq数据中的模拟和真实生物信号,评估了三种RUV方法(RUV2, RUVIII, RUV4)中三种实施策略(“trail”)的影响。同时考虑了技术噪声在混杂和模型不规范条件下的影响。此外,我们引入了一种新的策略,RUVIII PBPS,以消除技术重复不足或阴性对照基因的假体差异表达分析中不需要的变异。我们的分析表明,即使在存在混淆的情况下,去除RUV2或RUVIII每个细胞类型的不必要变异也可以提取与技术噪声相关的因素并控制错误发现率(FDR)。RUVIII PBPS成功地控制了FDR,而其他标准RUV方法由于缺少技术重复、感兴趣的因素与不需要的变异来源之间的依赖以及缺乏合理的负控制基因而无法使用。
{"title":"Removal of unwanted variation in pseudobulk analysis of single-cell RNA sequencing data and the leveraging of pseudoreplicates.","authors":"Sofía Prieto León, Ewoud De Troyer, Helena Geys, Koen Van den Berge, Olivier Thas","doi":"10.1093/nargab/lqaf179","DOIUrl":"10.1093/nargab/lqaf179","url":null,"abstract":"<p><p>Removing unwanted variation (RUV) is key for accurate biological interpretation in high-throughput sequencing studies. However, no standardized approach exists for pseudobulked single-cell RNA-sequencing (scRNA-seq) data. Improper implementation of RUV methods may remove biological information, jeopardizing power and false positive control in differential expression analysis. We evaluate the impact of three implementation strategies ('trails') in three RUV methods (RUV2, RUVIII, RUV4) using simulated and real biological signals in pseudobulked scRNA-seq data. Effects of technical noise under confounding and model misspecification conditions are also considered. Additionally, we introduce a novel strategy, RUVIII PBPS, to remove unwanted variation in pseudobulk differential expression analyses with insufficient technical replicates or negative control genes. Our analysis demonstrates that removing unwanted variation per cell type with RUV2 or RUVIII extracts factors associated with technical noise and controls the false discovery rate (FDR), even in the presence of confounding. RUVIII PBPS successfully controls the FDR when other standard RUV methods cannot be used due to missing technical replicates, dependence between the factor of interest and the sources of unwanted variation, and lack of plausible negative control genes.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 4","pages":"lqaf179"},"PeriodicalIF":2.8,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12684399/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145715626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
NAR Genomics and Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1