首页 > 最新文献

Genome Biology最新文献

英文 中文
Structure-enhanced graph meta learning for few-shot gene regulatory network inference 基于结构增强图元学习的小片段基因调控网络推理
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-20 DOI: 10.1186/s13059-025-03860-8
Weiming Yu, Zhuobin Chen, Yaohua Hu, Jing Qin, Le Ou-Yang
Inferring gene regulatory networks (GRNs) is essential for understanding biological regulation. Although numerous deep learning approaches have been developed for GRN inference, most require large amounts of labeled data. We present Meta-TGLink, a structure-enhanced graph meta-learning model for few-shot GRN inference. By formulating GRN inference as a link prediction task, Meta-TGLink captures transferable regulatory patterns while reducing dependence on extensive labeled datasets. The model combines graph neural networks with Transformer architectures to integrate relational and positional information, thereby improving predictive performance under data-scarce conditions. Experiments on real datasets demonstrate its superiority over state-of-the-art baselines, particularly in cross-domain few-shot scenarios.
推断基因调控网络(GRNs)对于理解生物调控至关重要。尽管已经开发了许多用于GRN推理的深度学习方法,但大多数方法都需要大量的标记数据。我们提出了一种用于少量GRN推理的结构增强图元学习模型Meta-TGLink。通过将GRN推理制定为链接预测任务,Meta-TGLink捕获可转移的调节模式,同时减少对广泛标记数据集的依赖。该模型将图神经网络与Transformer体系结构相结合,集成了关系信息和位置信息,从而提高了数据稀缺条件下的预测性能。在真实数据集上的实验证明了它优于最先进的基线,特别是在跨域的少数镜头场景中。
{"title":"Structure-enhanced graph meta learning for few-shot gene regulatory network inference","authors":"Weiming Yu, Zhuobin Chen, Yaohua Hu, Jing Qin, Le Ou-Yang","doi":"10.1186/s13059-025-03860-8","DOIUrl":"https://doi.org/10.1186/s13059-025-03860-8","url":null,"abstract":"Inferring gene regulatory networks (GRNs) is essential for understanding biological regulation. Although numerous deep learning approaches have been developed for GRN inference, most require large amounts of labeled data. We present Meta-TGLink, a structure-enhanced graph meta-learning model for few-shot GRN inference. By formulating GRN inference as a link prediction task, Meta-TGLink captures transferable regulatory patterns while reducing dependence on extensive labeled datasets. The model combines graph neural networks with Transformer architectures to integrate relational and positional information, thereby improving predictive performance under data-scarce conditions. Experiments on real datasets demonstrate its superiority over state-of-the-art baselines, particularly in cross-domain few-shot scenarios.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"135 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145554407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
X-LDR: an atlas of linkage disequilibrium across species X-LDR:跨物种连锁不平衡图谱
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-20 DOI: 10.1186/s13059-025-03863-5
Tian-Neng Zhu, Xing Huang, Meng-Yuan Yang, Guo-An Qi, Qi-Xin Zhang, Feng Lin, Wenjing Zhang, Zhe Zhang, Xin Jin, Hou-Feng Zheng, Hai-Ming Xu, Shizhou Yu, Guo-Bo Chen
To reach a genomic scale illustration for linkage disequilibrium (LD), we introduce X-LDR, a stochastic algorithm for biobank-scale data ( $$mathcal {O}(nmB)$$ , n the sample size, m the number of SNPs, and B iterations). X-LDR can scale the entire genome to high-resolution LD grids, such as nearly 9 million LD grids for UK Biobank ( $$n approx 300,000$$ and $$mapprox 4.2$$ million). Various characteristics of LD are discovered in terms of their biological annotation. We also present an unprecedented LD atlas for 25 reference populations that contours the diversity of interspecies LD. The algorithms have been implemented in C++.
为了获得基因组尺度上的链接不平衡(LD),我们引入了X-LDR,一种生物库尺度数据的随机算法($$mathcal {O}(nmB)$$, n个样本量,m个snp数量,B次迭代)。X-LDR可以将整个基因组缩放到高分辨率的LD网格,例如UK Biobank的近900万个LD网格($$n approx 300,000$$和$$mapprox 4.2$$百万)。从其生物学注释方面发现了LD的各种特征。我们还提出了一个前所未有的25个参考种群的LD图谱,描绘了种间LD的多样性。算法已在c++中实现。
{"title":"X-LDR: an atlas of linkage disequilibrium across species","authors":"Tian-Neng Zhu, Xing Huang, Meng-Yuan Yang, Guo-An Qi, Qi-Xin Zhang, Feng Lin, Wenjing Zhang, Zhe Zhang, Xin Jin, Hou-Feng Zheng, Hai-Ming Xu, Shizhou Yu, Guo-Bo Chen","doi":"10.1186/s13059-025-03863-5","DOIUrl":"https://doi.org/10.1186/s13059-025-03863-5","url":null,"abstract":"To reach a genomic scale illustration for linkage disequilibrium (LD), we introduce X-LDR, a stochastic algorithm for biobank-scale data ( $$mathcal {O}(nmB)$$ , n the sample size, m the number of SNPs, and B iterations). X-LDR can scale the entire genome to high-resolution LD grids, such as nearly 9 million LD grids for UK Biobank ( $$n approx 300,000$$ and $$mapprox 4.2$$ million). Various characteristics of LD are discovered in terms of their biological annotation. We also present an unprecedented LD atlas for 25 reference populations that contours the diversity of interspecies LD. The algorithms have been implemented in C++.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"28 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145554408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cell type-specific inference from bulk RNA-sequencing data by integrating single-cell reference profiles via EPIC-unmix 通过EPIC-unmix整合单细胞参考谱,从大量rna测序数据中推断细胞类型特异性
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-20 DOI: 10.1186/s13059-025-03847-5
Chenwei Tang, Quan Sun, Xinyue Zeng, Gang Li, Xiaoyu Yang, Fei Liu, Jinying Zhao, Yin Shen, Boxiang Liu, Jia Wen, Yun Li
Cell type-specific analysis is crucial for uncovering biological insights hidden in bulk tissue data, yet single-cell or single-nuclei approaches are often cost-prohibitive for large samples. We introduce EPIC-unmix, a novel two-step empirical Bayesian method combining reference single-cell/single-nuclei and bulk RNA-seq data to improve cell type-specific inference, accounting for the difference between reference and target datasets. Under comprehensive simulations, we demonstrate that EPIC-unmix outperforms alternative methods in accuracy. Applied to Alzheimer’s disease brain RNA-seq data, EPIC-unmix identifies multiple differentially expressed genes in a cell type-specific manner, and empowers cell type-specific eQTL analysis.
细胞类型特异性分析对于揭示隐藏在大量组织数据中的生物学见解至关重要,然而单细胞或单核方法对于大样本通常成本过高。我们引入EPIC-unmix,这是一种新的两步经验贝叶斯方法,结合参考单细胞/单核和大量RNA-seq数据来改进细胞类型特异性推断,考虑参考数据集和目标数据集之间的差异。通过综合仿真,我们证明了EPIC-unmix在精度上优于其他方法。EPIC-unmix应用于阿尔茨海默病脑RNA-seq数据,以细胞类型特异性的方式识别多个差异表达基因,并授权细胞类型特异性的eQTL分析。
{"title":"Cell type-specific inference from bulk RNA-sequencing data by integrating single-cell reference profiles via EPIC-unmix","authors":"Chenwei Tang, Quan Sun, Xinyue Zeng, Gang Li, Xiaoyu Yang, Fei Liu, Jinying Zhao, Yin Shen, Boxiang Liu, Jia Wen, Yun Li","doi":"10.1186/s13059-025-03847-5","DOIUrl":"https://doi.org/10.1186/s13059-025-03847-5","url":null,"abstract":"Cell type-specific analysis is crucial for uncovering biological insights hidden in bulk tissue data, yet single-cell or single-nuclei approaches are often cost-prohibitive for large samples. We introduce EPIC-unmix, a novel two-step empirical Bayesian method combining reference single-cell/single-nuclei and bulk RNA-seq data to improve cell type-specific inference, accounting for the difference between reference and target datasets. Under comprehensive simulations, we demonstrate that EPIC-unmix outperforms alternative methods in accuracy. Applied to Alzheimer’s disease brain RNA-seq data, EPIC-unmix identifies multiple differentially expressed genes in a cell type-specific manner, and empowers cell type-specific eQTL analysis.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"19 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145554406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scSpecies: enhancement of network architecture alignment in comparative single-cell studies scSpecies:在比较单细胞研究中增强网络架构一致性
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-20 DOI: 10.1186/s13059-025-03866-2
Clemens Schächter, Maren Hackenberg, Martin Treppner, Hanne Raum, Joschka Bödecker, Harald Binder
Animals can provide meaningful context for human single-cell data. To transfer information between species, we propose a deep learning approach that pre-trains a conditional variational autoencoder on animal data and transfers its final encoder layers to a human network architecture. Our approach then aligns latent spaces by leveraging data-level and model-learned similarities. We utilize this for label transfer and differential gene expression analysis in cross-species pairs of liver, adipose tissue, and glioblastoma datasets. Our results are robust even when gene sets differ, or datasets are small. Thus, we reliably exploit similarities between species to provide context for human single-cell data.
动物可以为人类单细胞数据提供有意义的背景。为了在物种之间传递信息,我们提出了一种深度学习方法,该方法在动物数据上预训练条件变分自编码器,并将其最终编码器层传输到人类网络架构中。然后,我们的方法通过利用数据级和模型学习的相似性来对齐潜在空间。我们利用这种方法对肝脏、脂肪组织和胶质母细胞瘤数据集的跨物种对进行标记转移和差异基因表达分析。我们的结果是稳健的,即使基因组不同,或数据集很小。因此,我们可靠地利用物种之间的相似性为人类单细胞数据提供背景。
{"title":"scSpecies: enhancement of network architecture alignment in comparative single-cell studies","authors":"Clemens Schächter, Maren Hackenberg, Martin Treppner, Hanne Raum, Joschka Bödecker, Harald Binder","doi":"10.1186/s13059-025-03866-2","DOIUrl":"https://doi.org/10.1186/s13059-025-03866-2","url":null,"abstract":"Animals can provide meaningful context for human single-cell data. To transfer information between species, we propose a deep learning approach that pre-trains a conditional variational autoencoder on animal data and transfers its final encoder layers to a human network architecture. Our approach then aligns latent spaces by leveraging data-level and model-learned similarities. We utilize this for label transfer and differential gene expression analysis in cross-species pairs of liver, adipose tissue, and glioblastoma datasets. Our results are robust even when gene sets differ, or datasets are small. Thus, we reliably exploit similarities between species to provide context for human single-cell data.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"8 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145554475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phased epigenomics and methylation inheritance in a historical Vitis vinifera hybrid 历史葡萄杂交品种的阶段性表观基因组学和甲基化遗传
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-17 DOI: 10.1186/s13059-025-03858-2
Noé Cochetel, Amanda M. Vondras, Rosa Figueroa-Balderas, Joel Liou, Paul Peluso, Dario Cantu
Epigenetic modifications, such as DNA methylation, regulate transcription and influence key biological traits. While many efforts were made to understand their stability in annual crops, their long-term persistence in clonally propagated plants remains poorly understood. Grapevine (Vitis vinifera) provides a unique model, with cultivars vegetatively propagated for centuries. Here, we assemble the phased genomes of Cabernet Sauvignon and its parental lineages, Cabernet Franc and Sauvignon Blanc, using HiFi long-reads and a gene map tenfold denser than existing maps. Using three clones per cultivar, we quantify methylation with very consistent short- and long-read sequencing and ensure both varietal representativeness and assessment of clonal variability. We leverage the parent-progeny sequence graph to highlight allele-specific methylation and conserved transcriptomic patterns for genes and small RNA. Such a format is essential to integrate multi-omics data and reveals that, despite less clonal conservation than genetic polymorphisms, methylation marks are remarkably inherited. By further demonstrating the linear-reference limitations, we determine that the correct representation of genetic variants by the sequence graph is crucial for the accurate allelic quantification of the methylome. These findings reveal the remarkable stability of epigenetic marks in a model propagated by asexual reproduction. Using a phased sequence graph, we introduce a scalable framework that accounts for genomic variation, accurately quantifies allele-specific methylation, and supports multi-omics integration such as our evaluation of the transcriptional impact of epigenetic inheritance. This approach has broad implications for perennial crops, where epigenetic variation could influence traits relevant to breeding, adaptation, and long-term agricultural sustainability.
表观遗传修饰,如DNA甲基化,调节转录并影响关键的生物学性状。虽然人们做了很多努力来了解它们在一年生作物中的稳定性,但它们在无性繁殖植物中的长期持久性仍然知之甚少。葡萄(Vitis vinifera)提供了一个独特的模式,其栽培品种无性繁殖了几个世纪。在这里,我们组装了赤霞珠及其亲本品丽珠和长相思的分阶段基因组,使用高保真长读和比现有图谱密度高10倍的基因图谱。每个品种使用三个克隆,我们通过非常一致的短读和长读测序来量化甲基化,并确保品种代表性和克隆变异性评估。我们利用亲代序列图来突出等位基因特异性甲基化和基因和小RNA的保守转录组模式。这种格式对于整合多组学数据至关重要,并揭示了尽管克隆保守性低于遗传多态性,甲基化标记显着遗传。通过进一步证明线性参考的局限性,我们确定序列图对遗传变异的正确表示对于甲基组的精确等位基因定量至关重要。这些发现揭示了通过无性繁殖繁殖的模型中表观遗传标记的显著稳定性。使用相序列图,我们引入了一个可扩展的框架来解释基因组变异,准确量化等位基因特异性甲基化,并支持多组学整合,例如我们对表观遗传转录影响的评估。这种方法对多年生作物具有广泛的意义,表观遗传变异可能影响与育种、适应和长期农业可持续性相关的性状。
{"title":"Phased epigenomics and methylation inheritance in a historical Vitis vinifera hybrid","authors":"Noé Cochetel, Amanda M. Vondras, Rosa Figueroa-Balderas, Joel Liou, Paul Peluso, Dario Cantu","doi":"10.1186/s13059-025-03858-2","DOIUrl":"https://doi.org/10.1186/s13059-025-03858-2","url":null,"abstract":"Epigenetic modifications, such as DNA methylation, regulate transcription and influence key biological traits. While many efforts were made to understand their stability in annual crops, their long-term persistence in clonally propagated plants remains poorly understood. Grapevine (Vitis vinifera) provides a unique model, with cultivars vegetatively propagated for centuries. Here, we assemble the phased genomes of Cabernet Sauvignon and its parental lineages, Cabernet Franc and Sauvignon Blanc, using HiFi long-reads and a gene map tenfold denser than existing maps. Using three clones per cultivar, we quantify methylation with very consistent short- and long-read sequencing and ensure both varietal representativeness and assessment of clonal variability. We leverage the parent-progeny sequence graph to highlight allele-specific methylation and conserved transcriptomic patterns for genes and small RNA. Such a format is essential to integrate multi-omics data and reveals that, despite less clonal conservation than genetic polymorphisms, methylation marks are remarkably inherited. By further demonstrating the linear-reference limitations, we determine that the correct representation of genetic variants by the sequence graph is crucial for the accurate allelic quantification of the methylome. These findings reveal the remarkable stability of epigenetic marks in a model propagated by asexual reproduction. Using a phased sequence graph, we introduce a scalable framework that accounts for genomic variation, accurately quantifies allele-specific methylation, and supports multi-omics integration such as our evaluation of the transcriptional impact of epigenetic inheritance. This approach has broad implications for perennial crops, where epigenetic variation could influence traits relevant to breeding, adaptation, and long-term agricultural sustainability.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"22 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145531557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Double-stranded DNA deaminase DddAE1347A can increase the efficiency and targeting range of cytidine base editors 双链DNA脱氨酶DddAE1347A可以提高胞苷碱基编辑器的效率和靶向范围
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-17 DOI: 10.1186/s13059-025-03849-3
Yuqiang Qian, Fengjiao Hui, Wenchao Niu, Di Wang, Yang Hao, Qingying Meng, Siyu Ren, Deqiang Kong, Heng Gong, Jiayu Wu, Kexin Chen, Muna Alariqi, Junping Gao, Zhanjun Li, Shuangxia Jin
Cytidine base editors (CBEs) consist of a single-strand specific cytidine deaminase fused to Cas9 nickase, enabling efficient C-to-T conversion across diverse organisms. Enhancing editing range and efficiency of these tools is essential for expanding their applications. In this study, we report that fusing a double-stranded DNA-specific cytosine deaminase DddAE1347A to CBEs significantly improves editing activity and broadens the editing window in cell lines, embryos, tobacco, and cotton. Compared to BE4max, the optimized DddAE1347A-BE4max exhibits up to a 93- fold increase in editing efficiency, achieving up to 52% efficiency at C14 and C15 in cell lines. Further investigation reveals that DddAE1347A is compatible with various Cas9 variants (SpCas9, SpaCas9, and Nme2Cas9) and deaminase variants (rA1, A3G, and A3A). Additionally, we demonstrate that cytosine deaminases with single-stranded DNA activity fail to enhance the CBE system. In contrast, various DddA variants can improve CBE editing activity at PAM-proximal cytosine positions, highlighting the modularity of fusion between DddAs and CBEs. These findings suggest that the double-stranded DNA-specific cytosine deaminase protein can act as an engineered fusion module in the CBE system, altering the performance (window/efficiency) of CBEs.
胞苷碱基编辑器(CBEs)由与Cas9酶融合的单链特异性胞苷脱氨酶组成,可在不同生物中实现高效的c -t转化。增强这些工具的编辑范围和效率对于扩展它们的应用至关重要。在这项研究中,我们报道了将双链dna特异性胞嘧啶脱氨酶DddAE1347A与CBEs融合,显著提高了细胞系、胚胎、烟草和棉花的编辑活性,拓宽了编辑窗口。与BE4max相比,优化后的DddAE1347A-BE4max的编辑效率提高了93倍,在细胞系C14和C15上的编辑效率高达52%。进一步研究发现,DddAE1347A与多种Cas9变体(SpCas9、SpaCas9和Nme2Cas9)和脱氨酶变体(rA1、A3G和A3A)兼容。此外,我们证明具有单链DNA活性的胞嘧啶脱氨酶不能增强CBE系统。相反,各种DddA变体可以提高pam -近端胞嘧啶位置的CBE编辑活性,突出了DddA与CBE融合的模块化。这些发现表明,双链dna特异性胞嘧啶脱氨酶蛋白可以作为CBE系统中的工程融合模块,改变CBE的性能(窗口/效率)。
{"title":"Double-stranded DNA deaminase DddAE1347A can increase the efficiency and targeting range of cytidine base editors","authors":"Yuqiang Qian, Fengjiao Hui, Wenchao Niu, Di Wang, Yang Hao, Qingying Meng, Siyu Ren, Deqiang Kong, Heng Gong, Jiayu Wu, Kexin Chen, Muna Alariqi, Junping Gao, Zhanjun Li, Shuangxia Jin","doi":"10.1186/s13059-025-03849-3","DOIUrl":"https://doi.org/10.1186/s13059-025-03849-3","url":null,"abstract":"Cytidine base editors (CBEs) consist of a single-strand specific cytidine deaminase fused to Cas9 nickase, enabling efficient C-to-T conversion across diverse organisms. Enhancing editing range and efficiency of these tools is essential for expanding their applications. In this study, we report that fusing a double-stranded DNA-specific cytosine deaminase DddAE1347A to CBEs significantly improves editing activity and broadens the editing window in cell lines, embryos, tobacco, and cotton. Compared to BE4max, the optimized DddAE1347A-BE4max exhibits up to a 93- fold increase in editing efficiency, achieving up to 52% efficiency at C14 and C15 in cell lines. Further investigation reveals that DddAE1347A is compatible with various Cas9 variants (SpCas9, SpaCas9, and Nme2Cas9) and deaminase variants (rA1, A3G, and A3A). Additionally, we demonstrate that cytosine deaminases with single-stranded DNA activity fail to enhance the CBE system. In contrast, various DddA variants can improve CBE editing activity at PAM-proximal cytosine positions, highlighting the modularity of fusion between DddAs and CBEs. These findings suggest that the double-stranded DNA-specific cytosine deaminase protein can act as an engineered fusion module in the CBE system, altering the performance (window/efficiency) of CBEs.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"174 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145531558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
KegAlign: optimizing pairwise alignments with diagonal partitioning KegAlign:优化对角分区的成对对齐
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-17 DOI: 10.1186/s13059-025-03830-0
A. Burak Gulhan, Richard Burhans, Robert Harris, Mahmut Kandemir, Maximilian Haeussler, Anton Nekrutenko
Advances in sequencing and assembly allow the creation of thousands of genome assemblies. However, producing multiple alignments required for their analysis lags behind due to the time-consuming process of pairwise alignment, typically performed by the slow but sensitive tool lastZ. Here, we develop KegAlign, an optimized GPU-enabled pairwise aligner. KegAlign employs a novel diagonal partitioning parallelization strategy and leverages advanced GPU features. It can compute a human/mouse alignment in under 6 h on a GPU-containing node without pre-partitioning, maintaining lastZ-level sensitivity crucial for divergent genomes. KegAlign is available as source code, a Conda package, and a user-friendly Galaxy workflow.
测序和组装技术的进步使得成千上万的基因组组装得以创建。然而,由于成对校准的耗时过程,通常由缓慢但敏感的工具lastZ执行,因此产生分析所需的多个校准滞后。在这里,我们开发了KegAlign,一个优化的支持gpu的成对对齐器。KegAlign采用了一种新颖的对角分区并行化策略,并利用了先进的GPU特性。它可以在6小时内在包含gpu的节点上计算人/鼠比对,而无需预分区,保持对不同基因组至关重要的lastz级灵敏度。KegAlign以源代码、Conda包和用户友好的Galaxy工作流的形式提供。
{"title":"KegAlign: optimizing pairwise alignments with diagonal partitioning","authors":"A. Burak Gulhan, Richard Burhans, Robert Harris, Mahmut Kandemir, Maximilian Haeussler, Anton Nekrutenko","doi":"10.1186/s13059-025-03830-0","DOIUrl":"https://doi.org/10.1186/s13059-025-03830-0","url":null,"abstract":"Advances in sequencing and assembly allow the creation of thousands of genome assemblies. However, producing multiple alignments required for their analysis lags behind due to the time-consuming process of pairwise alignment, typically performed by the slow but sensitive tool lastZ. Here, we develop KegAlign, an optimized GPU-enabled pairwise aligner. KegAlign employs a novel diagonal partitioning parallelization strategy and leverages advanced GPU features. It can compute a human/mouse alignment in under 6 h on a GPU-containing node without pre-partitioning, maintaining lastZ-level sensitivity crucial for divergent genomes. KegAlign is available as source code, a Conda package, and a user-friendly Galaxy workflow.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"1 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145531560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ricePSP: a database of rice phase separation-associated proteins 稻谷相分离相关蛋白数据库
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-17 DOI: 10.1186/s13059-025-03842-w
Runxin Gao, Minrong Guo, Shiping Luan, Weizhi Ouyang, Xingwang Li
Multivalent interactions between proteins with intrinsically disordered regions or prion-like domains can drive liquid–liquid phase separation (LLPS) and form membraneless condensates essential for diverse cellular functions. Here, we predict phase separation scores for all annotated rice proteins and present ricePSP ( https://ricepsp.github.io/ ), a database of phase separation-associated proteins. AlphaFold structural predictions further validate the phase separation potential of these proteins. As a proof of concept, we apply ricePSP to identify flowering-related phase separation proteins, revealing insights into how LLPS may regulate flowering. Collectively, ricePSP provides a valuable resource for studying crop phase separation proteins and LLPS-related mechanisms in crop trait regulation.
具有内在无序区域或朊病毒样结构域的蛋白质之间的多价相互作用可以驱动液-液相分离(LLPS)并形成多种细胞功能所必需的无膜凝聚物。在这里,我们预测了所有注释的水稻蛋白的相分离分数,并提出了rice epsp (https://ricepsp.github.io/),一个相分离相关蛋白的数据库。AlphaFold结构预测进一步验证了这些蛋白质的相分离潜力。作为概念验证,我们应用rice epsp来鉴定与开花相关的相分离蛋白,揭示了LLPS如何调节开花。综上所述,ricePSP为研究作物相分离蛋白和llps在作物性状调控中的相关机制提供了宝贵的资源。
{"title":"ricePSP: a database of rice phase separation-associated proteins","authors":"Runxin Gao, Minrong Guo, Shiping Luan, Weizhi Ouyang, Xingwang Li","doi":"10.1186/s13059-025-03842-w","DOIUrl":"https://doi.org/10.1186/s13059-025-03842-w","url":null,"abstract":"Multivalent interactions between proteins with intrinsically disordered regions or prion-like domains can drive liquid–liquid phase separation (LLPS) and form membraneless condensates essential for diverse cellular functions. Here, we predict phase separation scores for all annotated rice proteins and present ricePSP ( https://ricepsp.github.io/ ), a database of phase separation-associated proteins. AlphaFold structural predictions further validate the phase separation potential of these proteins. As a proof of concept, we apply ricePSP to identify flowering-related phase separation proteins, revealing insights into how LLPS may regulate flowering. Collectively, ricePSP provides a valuable resource for studying crop phase separation proteins and LLPS-related mechanisms in crop trait regulation.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"4 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145531559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepCOI: a large language model-driven framework for fast and accurate taxonomic assignment in animal metabarcoding DeepCOI:一个大型语言模型驱动框架,用于快速准确的动物元条形码分类分配
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-17 DOI: 10.1186/s13059-025-03861-7
Ho-Jin Gwak, Mina Rho
Metabarcoding remains challenging due to incomplete taxonomic annotations and computationally intensive processes. We present DeepCOI, a large language model-based classifier pre-trained on seven million cytochrome c oxidase I gene sequences. DeepCOI enables fast and accurate taxonomic assignment across eight major phyla, achieving an AU-ROC of 0.958 and AU-PR of 0.897–outperforming existing methods while significantly reducing inference time. Additionally, DeepCOI demonstrates interpretability by identifying taxonomically informative sequence positions. By integrating large-scale datasets and self-supervised learning, DeepCOI enhances both the accuracy and efficiency of metabarcoding processes, providing a scalable solution for biodiversity assessment and environmental monitoring.
元条形码由于不完整的分类注释和计算密集型的过程仍然具有挑战性。我们提出了DeepCOI,一个基于语言模型的大型分类器,预先训练了700万个细胞色素c氧化酶I基因序列。DeepCOI能够在8个主要门之间实现快速准确的分类分配,AU-ROC为0.958,AU-PR为0.897 -优于现有方法,同时显着缩短了推理时间。此外,DeepCOI通过识别分类信息序列位置来证明可解释性。通过整合大规模数据集和自监督学习,DeepCOI提高了元条形码过程的准确性和效率,为生物多样性评估和环境监测提供了可扩展的解决方案。
{"title":"DeepCOI: a large language model-driven framework for fast and accurate taxonomic assignment in animal metabarcoding","authors":"Ho-Jin Gwak, Mina Rho","doi":"10.1186/s13059-025-03861-7","DOIUrl":"https://doi.org/10.1186/s13059-025-03861-7","url":null,"abstract":"Metabarcoding remains challenging due to incomplete taxonomic annotations and computationally intensive processes. We present DeepCOI, a large language model-based classifier pre-trained on seven million cytochrome c oxidase I gene sequences. DeepCOI enables fast and accurate taxonomic assignment across eight major phyla, achieving an AU-ROC of 0.958 and AU-PR of 0.897–outperforming existing methods while significantly reducing inference time. Additionally, DeepCOI demonstrates interpretability by identifying taxonomically informative sequence positions. By integrating large-scale datasets and self-supervised learning, DeepCOI enhances both the accuracy and efficiency of metabarcoding processes, providing a scalable solution for biodiversity assessment and environmental monitoring.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"39 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145531556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comparison of computational methods for expression forecasting 表达式预测计算方法的比较
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-17 DOI: 10.1186/s13059-025-03840-y
Eric Kernfeld, Yunxiao Yang, Joshua S. Weinstock, Alexis Battle, Patrick Cahan
Diverse machine learning methods promise to forecast gene expression changes in response to novel genetic perturbations. However, these methods’ accuracy is not well characterized. We created a benchmarking platform that combines a panel of 11 large-scale perturbation datasets with an expression forecasting software engine that encompasses or interfaces to a wide variety of methods. We used our platform to assess methods, parameters, and sources of auxiliary data, finding that it is uncommon for expression forecasting methods to outperform simple baselines. Our platform will serve as a resource to improve methods and to identify contexts in which expression forecasting can succeed.
不同的机器学习方法有望预测基因表达变化,以应对新的遗传扰动。然而,这些方法的准确性并没有得到很好的表征。我们创建了一个基准测试平台,该平台将11个大规模扰动数据集的面板与包含或接口各种方法的表达预测软件引擎相结合。我们使用我们的平台来评估方法、参数和辅助数据的来源,发现表达预测方法优于简单基线的情况并不常见。我们的平台将作为一种资源来改进方法,并确定表达预测可以成功的环境。
{"title":"A comparison of computational methods for expression forecasting","authors":"Eric Kernfeld, Yunxiao Yang, Joshua S. Weinstock, Alexis Battle, Patrick Cahan","doi":"10.1186/s13059-025-03840-y","DOIUrl":"https://doi.org/10.1186/s13059-025-03840-y","url":null,"abstract":"Diverse machine learning methods promise to forecast gene expression changes in response to novel genetic perturbations. However, these methods’ accuracy is not well characterized. We created a benchmarking platform that combines a panel of 11 large-scale perturbation datasets with an expression forecasting software engine that encompasses or interfaces to a wide variety of methods. We used our platform to assess methods, parameters, and sources of auxiliary data, finding that it is uncommon for expression forecasting methods to outperform simple baselines. Our platform will serve as a resource to improve methods and to identify contexts in which expression forecasting can succeed.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"28 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145531561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Genome Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1