首页 > 最新文献

Genome Biology最新文献

英文 中文
A reinforcement learning-based approach for dynamic privacy protection in genomic data sharing beacons 基因组数据共享信标中基于强化学习的动态隐私保护方法
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-24 DOI: 10.1186/s13059-025-03871-5
Masoud Poorghaffar Aghdam, Sobhan Shukueian Tabrizi, Kerem Ayöz, Erman Ayday, Sinem Sav, A. Ercüment Çiçek
The rise of genomic sequencing raises privacy concerns due to the identifiable nature of genomic data. The GA4GH Beacon Project enables privacy-preserving data sharing but is vulnerable to membership inference attacks that reveal individual participation. Existing defenses, such as noise addition and query restrictions, rely on static policies that attackers can bypass. We introduce the first reinforcement learning (RL)-based dynamic defense for the beacon protocol, training defender and attacker agents in a multiplayer setting. Our approach adapts responses in real time, distinguishing users from adversaries and balancing privacy with utility against evolving threats.
由于基因组数据的可识别性,基因组测序的兴起引起了对隐私的担忧。GA4GH信标项目支持保护隐私的数据共享,但容易受到暴露个人参与的成员推理攻击。现有的防御,如噪音添加和查询限制,依赖于攻击者可以绕过的静态策略。我们为信标协议引入了第一个基于强化学习(RL)的动态防御,在多人环境中训练防御者和攻击者代理。我们的方法适应实时响应,区分用户和对手,并平衡隐私和效用,以应对不断变化的威胁。
{"title":"A reinforcement learning-based approach for dynamic privacy protection in genomic data sharing beacons","authors":"Masoud Poorghaffar Aghdam, Sobhan Shukueian Tabrizi, Kerem Ayöz, Erman Ayday, Sinem Sav, A. Ercüment Çiçek","doi":"10.1186/s13059-025-03871-5","DOIUrl":"https://doi.org/10.1186/s13059-025-03871-5","url":null,"abstract":"The rise of genomic sequencing raises privacy concerns due to the identifiable nature of genomic data. The GA4GH Beacon Project enables privacy-preserving data sharing but is vulnerable to membership inference attacks that reveal individual participation. Existing defenses, such as noise addition and query restrictions, rely on static policies that attackers can bypass. We introduce the first reinforcement learning (RL)-based dynamic defense for the beacon protocol, training defender and attacker agents in a multiplayer setting. Our approach adapts responses in real time, distinguishing users from adversaries and balancing privacy with utility against evolving threats.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"143 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145583714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Long-read structural variant discovery and targeted short read genotyping enables population scale characterization of structural variation in rhesus macaques 长读结构变异的发现和靶向短读基因分型使恒河猴结构变异的种群尺度表征成为可能
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-21 DOI: 10.1186/s13059-025-03873-3
Karina Ray, Christina Mulch, Samuel M. Peterson, Sebastian Benjamin, Nathan Gullicksrud, Adam J. Ericsen, Eric J. Vallender, Betsy M. Ferguson, Jeffrey D. Wall, Benjamin N. Bimber
Due to their close evolutionary relationship with humans, rhesus macaques are an important pre-clinical model. While genetic diversity driven by short nucleotide variation has long been studied in rhesus macaques, there is comparatively little known about structural variation, with most published studies focused on cross-species comparative analyses. Understanding the degree and implications of intraspecies structural variation is essential to all biomedical research using rhesus macaques as a model. Here we present long-read sequencing of 59 rhesus macaques, identifying a catalog of 339,334 structural variants (SVs), which we subsequently genotype in a cohort of 2,645 individuals with short read whole genome sequencing data to create the largest public dataset of rhesus macaque SVs. These data reveal population structure within rhesus macaque SVs based on both geographic ancestry and to a lesser degree, breeding center. While there is evidence of strong purifying selection against SVs within exons, 0.7% of SVs overlap exons, with an average of 16.9 rare SVs per subject predicted to have a high impact on protein coding sequences. Notably, rhesus macaque SVs are dominated by Alu retrotransposition events, which comprise 55.7% of SVs and suggest significantly different modes of SV formation relative to humans and great apes. This dataset represents the largest study of structural variation in rhesus macaques to date and demonstrates use of both long and short-read datasets to generate SV genotype data. These data enable the consideration of structural variation impact in rhesus macaque-based research and will also aid the development of primate pangenomes.
由于它们与人类的密切进化关系,恒河猴是重要的临床前模型。长期以来,人们一直在恒河猴中研究由短核苷酸变异驱动的遗传多样性,但对结构变异的研究相对较少,大多数已发表的研究都集中在跨物种比较分析上。了解种内结构变异的程度和影响对所有以恒河猴为模型的生物医学研究至关重要。在这里,我们展示了59只恒河猴的长读测序,确定了339,334个结构变异(SVs)的目录,随后我们在2645个个体的短读全基因组测序数据中进行基因分型,以创建最大的恒河猴SVs公共数据集。这些数据揭示了恒河猴SVs的种群结构既基于地理祖先,也在较小程度上基于繁殖中心。虽然有证据表明外显子内存在对SVs的强烈纯化选择,但0.7%的SVs重叠外显子,平均每个受试者有16.9个罕见的SVs,预计对蛋白质编码序列有很大影响。值得注意的是,恒河猴的SV以Alu逆转录事件为主,占SV的55.7%,表明与人类和类人猿相比,SV的形成模式明显不同。该数据集代表了迄今为止对恒河猴结构变异的最大研究,并展示了使用长读和短读数据集来生成SV基因型数据。这些数据使考虑结构变异对恒河猴研究的影响,也将有助于灵长类动物泛基因组的发展。
{"title":"Long-read structural variant discovery and targeted short read genotyping enables population scale characterization of structural variation in rhesus macaques","authors":"Karina Ray, Christina Mulch, Samuel M. Peterson, Sebastian Benjamin, Nathan Gullicksrud, Adam J. Ericsen, Eric J. Vallender, Betsy M. Ferguson, Jeffrey D. Wall, Benjamin N. Bimber","doi":"10.1186/s13059-025-03873-3","DOIUrl":"https://doi.org/10.1186/s13059-025-03873-3","url":null,"abstract":"Due to their close evolutionary relationship with humans, rhesus macaques are an important pre-clinical model. While genetic diversity driven by short nucleotide variation has long been studied in rhesus macaques, there is comparatively little known about structural variation, with most published studies focused on cross-species comparative analyses. Understanding the degree and implications of intraspecies structural variation is essential to all biomedical research using rhesus macaques as a model. Here we present long-read sequencing of 59 rhesus macaques, identifying a catalog of 339,334 structural variants (SVs), which we subsequently genotype in a cohort of 2,645 individuals with short read whole genome sequencing data to create the largest public dataset of rhesus macaque SVs. These data reveal population structure within rhesus macaque SVs based on both geographic ancestry and to a lesser degree, breeding center. While there is evidence of strong purifying selection against SVs within exons, 0.7% of SVs overlap exons, with an average of 16.9 rare SVs per subject predicted to have a high impact on protein coding sequences. Notably, rhesus macaque SVs are dominated by Alu retrotransposition events, which comprise 55.7% of SVs and suggest significantly different modes of SV formation relative to humans and great apes. This dataset represents the largest study of structural variation in rhesus macaques to date and demonstrates use of both long and short-read datasets to generate SV genotype data. These data enable the consideration of structural variation impact in rhesus macaque-based research and will also aid the development of primate pangenomes.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"11 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145559410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benchmarking deep learning methods for biologically conserved single-cell integration 生物保守的单细胞整合的深度学习基准方法
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-20 DOI: 10.1186/s13059-025-03869-z
Chenxin Yi, Jinyu Cheng, Jiajun Chen, Wanquan Liu, Junwei Liu, Yixue Li
Advancements in single-cell RNA sequencing have enabled the analysis of millions of cells, but integrating such data across samples and methods while mitigating batch effects remains challenging. Deep learning approaches address this by learning biologically conserved gene expression representations, yet systematic benchmarking of loss functions and integration performance is lacking. We evaluate 16 integration methods using a unified variational autoencoder framework, incorporating batch and cell-type information. Results reveal limitations in the single-cell integration benchmarking index (scIB) for preserving intra-cell-type information. To address this, we introduce a correlation-based loss function and enhance benchmarking metrics to better capture biological conservation. Using cell annotations from lung and breast atlases, our approach improves biological signal preservation. We propose a refined integration framework, scIB-E, and metrics that provide deeper insights into the integration process and offer guidance for advanced developments in integrating increasingly complex single-cell data. This benchmark highlights the potential of deep learning-based approaches for single-cell data integration, emphasizing the importance of biologically informed metrics and improved benchmarking strategies.
单细胞RNA测序技术的进步使数百万个细胞的分析成为可能,但在减轻批量效应的同时,跨样本和方法整合这些数据仍然具有挑战性。深度学习方法通过学习生物学上保守的基因表达表征来解决这个问题,但缺乏损失函数和集成性能的系统基准测试。我们使用统一的变分自编码器框架评估了16种集成方法,包括批处理和单元类型信息。结果显示单细胞整合基准指数(scIB)保存细胞内类型信息的局限性。为了解决这个问题,我们引入了一个基于相关的损失函数,并增强了基准指标,以更好地捕捉生物保护。利用来自肺和乳腺图谱的细胞注释,我们的方法提高了生物信号的保存。我们提出了一个精细化的集成框架、scIB-E和指标,为集成过程提供了更深入的见解,并为集成日益复杂的单细胞数据的高级开发提供了指导。该基准强调了基于深度学习的单细胞数据集成方法的潜力,强调了生物学知情指标和改进基准策略的重要性。
{"title":"Benchmarking deep learning methods for biologically conserved single-cell integration","authors":"Chenxin Yi, Jinyu Cheng, Jiajun Chen, Wanquan Liu, Junwei Liu, Yixue Li","doi":"10.1186/s13059-025-03869-z","DOIUrl":"https://doi.org/10.1186/s13059-025-03869-z","url":null,"abstract":"Advancements in single-cell RNA sequencing have enabled the analysis of millions of cells, but integrating such data across samples and methods while mitigating batch effects remains challenging. Deep learning approaches address this by learning biologically conserved gene expression representations, yet systematic benchmarking of loss functions and integration performance is lacking. We evaluate 16 integration methods using a unified variational autoencoder framework, incorporating batch and cell-type information. Results reveal limitations in the single-cell integration benchmarking index (scIB) for preserving intra-cell-type information. To address this, we introduce a correlation-based loss function and enhance benchmarking metrics to better capture biological conservation. Using cell annotations from lung and breast atlases, our approach improves biological signal preservation. We propose a refined integration framework, scIB-E, and metrics that provide deeper insights into the integration process and offer guidance for advanced developments in integrating increasingly complex single-cell data. This benchmark highlights the potential of deep learning-based approaches for single-cell data integration, emphasizing the importance of biologically informed metrics and improved benchmarking strategies.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"177 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145554403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structure-enhanced graph meta learning for few-shot gene regulatory network inference 基于结构增强图元学习的小片段基因调控网络推理
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-20 DOI: 10.1186/s13059-025-03860-8
Weiming Yu, Zhuobin Chen, Yaohua Hu, Jing Qin, Le Ou-Yang
Inferring gene regulatory networks (GRNs) is essential for understanding biological regulation. Although numerous deep learning approaches have been developed for GRN inference, most require large amounts of labeled data. We present Meta-TGLink, a structure-enhanced graph meta-learning model for few-shot GRN inference. By formulating GRN inference as a link prediction task, Meta-TGLink captures transferable regulatory patterns while reducing dependence on extensive labeled datasets. The model combines graph neural networks with Transformer architectures to integrate relational and positional information, thereby improving predictive performance under data-scarce conditions. Experiments on real datasets demonstrate its superiority over state-of-the-art baselines, particularly in cross-domain few-shot scenarios.
推断基因调控网络(GRNs)对于理解生物调控至关重要。尽管已经开发了许多用于GRN推理的深度学习方法,但大多数方法都需要大量的标记数据。我们提出了一种用于少量GRN推理的结构增强图元学习模型Meta-TGLink。通过将GRN推理制定为链接预测任务,Meta-TGLink捕获可转移的调节模式,同时减少对广泛标记数据集的依赖。该模型将图神经网络与Transformer体系结构相结合,集成了关系信息和位置信息,从而提高了数据稀缺条件下的预测性能。在真实数据集上的实验证明了它优于最先进的基线,特别是在跨域的少数镜头场景中。
{"title":"Structure-enhanced graph meta learning for few-shot gene regulatory network inference","authors":"Weiming Yu, Zhuobin Chen, Yaohua Hu, Jing Qin, Le Ou-Yang","doi":"10.1186/s13059-025-03860-8","DOIUrl":"https://doi.org/10.1186/s13059-025-03860-8","url":null,"abstract":"Inferring gene regulatory networks (GRNs) is essential for understanding biological regulation. Although numerous deep learning approaches have been developed for GRN inference, most require large amounts of labeled data. We present Meta-TGLink, a structure-enhanced graph meta-learning model for few-shot GRN inference. By formulating GRN inference as a link prediction task, Meta-TGLink captures transferable regulatory patterns while reducing dependence on extensive labeled datasets. The model combines graph neural networks with Transformer architectures to integrate relational and positional information, thereby improving predictive performance under data-scarce conditions. Experiments on real datasets demonstrate its superiority over state-of-the-art baselines, particularly in cross-domain few-shot scenarios.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"135 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145554407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
X-LDR: an atlas of linkage disequilibrium across species X-LDR:跨物种连锁不平衡图谱
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-20 DOI: 10.1186/s13059-025-03863-5
Tian-Neng Zhu, Xing Huang, Meng-Yuan Yang, Guo-An Qi, Qi-Xin Zhang, Feng Lin, Wenjing Zhang, Zhe Zhang, Xin Jin, Hou-Feng Zheng, Hai-Ming Xu, Shizhou Yu, Guo-Bo Chen
To reach a genomic scale illustration for linkage disequilibrium (LD), we introduce X-LDR, a stochastic algorithm for biobank-scale data ( $$mathcal {O}(nmB)$$ , n the sample size, m the number of SNPs, and B iterations). X-LDR can scale the entire genome to high-resolution LD grids, such as nearly 9 million LD grids for UK Biobank ( $$n approx 300,000$$ and $$mapprox 4.2$$ million). Various characteristics of LD are discovered in terms of their biological annotation. We also present an unprecedented LD atlas for 25 reference populations that contours the diversity of interspecies LD. The algorithms have been implemented in C++.
为了获得基因组尺度上的链接不平衡(LD),我们引入了X-LDR,一种生物库尺度数据的随机算法($$mathcal {O}(nmB)$$, n个样本量,m个snp数量,B次迭代)。X-LDR可以将整个基因组缩放到高分辨率的LD网格,例如UK Biobank的近900万个LD网格($$n approx 300,000$$和$$mapprox 4.2$$百万)。从其生物学注释方面发现了LD的各种特征。我们还提出了一个前所未有的25个参考种群的LD图谱,描绘了种间LD的多样性。算法已在c++中实现。
{"title":"X-LDR: an atlas of linkage disequilibrium across species","authors":"Tian-Neng Zhu, Xing Huang, Meng-Yuan Yang, Guo-An Qi, Qi-Xin Zhang, Feng Lin, Wenjing Zhang, Zhe Zhang, Xin Jin, Hou-Feng Zheng, Hai-Ming Xu, Shizhou Yu, Guo-Bo Chen","doi":"10.1186/s13059-025-03863-5","DOIUrl":"https://doi.org/10.1186/s13059-025-03863-5","url":null,"abstract":"To reach a genomic scale illustration for linkage disequilibrium (LD), we introduce X-LDR, a stochastic algorithm for biobank-scale data ( $$mathcal {O}(nmB)$$ , n the sample size, m the number of SNPs, and B iterations). X-LDR can scale the entire genome to high-resolution LD grids, such as nearly 9 million LD grids for UK Biobank ( $$n approx 300,000$$ and $$mapprox 4.2$$ million). Various characteristics of LD are discovered in terms of their biological annotation. We also present an unprecedented LD atlas for 25 reference populations that contours the diversity of interspecies LD. The algorithms have been implemented in C++.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"28 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145554408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cell type-specific inference from bulk RNA-sequencing data by integrating single-cell reference profiles via EPIC-unmix 通过EPIC-unmix整合单细胞参考谱,从大量rna测序数据中推断细胞类型特异性
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-20 DOI: 10.1186/s13059-025-03847-5
Chenwei Tang, Quan Sun, Xinyue Zeng, Gang Li, Xiaoyu Yang, Fei Liu, Jinying Zhao, Yin Shen, Boxiang Liu, Jia Wen, Yun Li
Cell type-specific analysis is crucial for uncovering biological insights hidden in bulk tissue data, yet single-cell or single-nuclei approaches are often cost-prohibitive for large samples. We introduce EPIC-unmix, a novel two-step empirical Bayesian method combining reference single-cell/single-nuclei and bulk RNA-seq data to improve cell type-specific inference, accounting for the difference between reference and target datasets. Under comprehensive simulations, we demonstrate that EPIC-unmix outperforms alternative methods in accuracy. Applied to Alzheimer’s disease brain RNA-seq data, EPIC-unmix identifies multiple differentially expressed genes in a cell type-specific manner, and empowers cell type-specific eQTL analysis.
细胞类型特异性分析对于揭示隐藏在大量组织数据中的生物学见解至关重要,然而单细胞或单核方法对于大样本通常成本过高。我们引入EPIC-unmix,这是一种新的两步经验贝叶斯方法,结合参考单细胞/单核和大量RNA-seq数据来改进细胞类型特异性推断,考虑参考数据集和目标数据集之间的差异。通过综合仿真,我们证明了EPIC-unmix在精度上优于其他方法。EPIC-unmix应用于阿尔茨海默病脑RNA-seq数据,以细胞类型特异性的方式识别多个差异表达基因,并授权细胞类型特异性的eQTL分析。
{"title":"Cell type-specific inference from bulk RNA-sequencing data by integrating single-cell reference profiles via EPIC-unmix","authors":"Chenwei Tang, Quan Sun, Xinyue Zeng, Gang Li, Xiaoyu Yang, Fei Liu, Jinying Zhao, Yin Shen, Boxiang Liu, Jia Wen, Yun Li","doi":"10.1186/s13059-025-03847-5","DOIUrl":"https://doi.org/10.1186/s13059-025-03847-5","url":null,"abstract":"Cell type-specific analysis is crucial for uncovering biological insights hidden in bulk tissue data, yet single-cell or single-nuclei approaches are often cost-prohibitive for large samples. We introduce EPIC-unmix, a novel two-step empirical Bayesian method combining reference single-cell/single-nuclei and bulk RNA-seq data to improve cell type-specific inference, accounting for the difference between reference and target datasets. Under comprehensive simulations, we demonstrate that EPIC-unmix outperforms alternative methods in accuracy. Applied to Alzheimer’s disease brain RNA-seq data, EPIC-unmix identifies multiple differentially expressed genes in a cell type-specific manner, and empowers cell type-specific eQTL analysis.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"19 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145554406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scSpecies: enhancement of network architecture alignment in comparative single-cell studies scSpecies:在比较单细胞研究中增强网络架构一致性
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-20 DOI: 10.1186/s13059-025-03866-2
Clemens Schächter, Maren Hackenberg, Martin Treppner, Hanne Raum, Joschka Bödecker, Harald Binder
Animals can provide meaningful context for human single-cell data. To transfer information between species, we propose a deep learning approach that pre-trains a conditional variational autoencoder on animal data and transfers its final encoder layers to a human network architecture. Our approach then aligns latent spaces by leveraging data-level and model-learned similarities. We utilize this for label transfer and differential gene expression analysis in cross-species pairs of liver, adipose tissue, and glioblastoma datasets. Our results are robust even when gene sets differ, or datasets are small. Thus, we reliably exploit similarities between species to provide context for human single-cell data.
动物可以为人类单细胞数据提供有意义的背景。为了在物种之间传递信息,我们提出了一种深度学习方法,该方法在动物数据上预训练条件变分自编码器,并将其最终编码器层传输到人类网络架构中。然后,我们的方法通过利用数据级和模型学习的相似性来对齐潜在空间。我们利用这种方法对肝脏、脂肪组织和胶质母细胞瘤数据集的跨物种对进行标记转移和差异基因表达分析。我们的结果是稳健的,即使基因组不同,或数据集很小。因此,我们可靠地利用物种之间的相似性为人类单细胞数据提供背景。
{"title":"scSpecies: enhancement of network architecture alignment in comparative single-cell studies","authors":"Clemens Schächter, Maren Hackenberg, Martin Treppner, Hanne Raum, Joschka Bödecker, Harald Binder","doi":"10.1186/s13059-025-03866-2","DOIUrl":"https://doi.org/10.1186/s13059-025-03866-2","url":null,"abstract":"Animals can provide meaningful context for human single-cell data. To transfer information between species, we propose a deep learning approach that pre-trains a conditional variational autoencoder on animal data and transfers its final encoder layers to a human network architecture. Our approach then aligns latent spaces by leveraging data-level and model-learned similarities. We utilize this for label transfer and differential gene expression analysis in cross-species pairs of liver, adipose tissue, and glioblastoma datasets. Our results are robust even when gene sets differ, or datasets are small. Thus, we reliably exploit similarities between species to provide context for human single-cell data.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"8 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145554475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phased epigenomics and methylation inheritance in a historical Vitis vinifera hybrid 历史葡萄杂交品种的阶段性表观基因组学和甲基化遗传
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-17 DOI: 10.1186/s13059-025-03858-2
Noé Cochetel, Amanda M. Vondras, Rosa Figueroa-Balderas, Joel Liou, Paul Peluso, Dario Cantu
Epigenetic modifications, such as DNA methylation, regulate transcription and influence key biological traits. While many efforts were made to understand their stability in annual crops, their long-term persistence in clonally propagated plants remains poorly understood. Grapevine (Vitis vinifera) provides a unique model, with cultivars vegetatively propagated for centuries. Here, we assemble the phased genomes of Cabernet Sauvignon and its parental lineages, Cabernet Franc and Sauvignon Blanc, using HiFi long-reads and a gene map tenfold denser than existing maps. Using three clones per cultivar, we quantify methylation with very consistent short- and long-read sequencing and ensure both varietal representativeness and assessment of clonal variability. We leverage the parent-progeny sequence graph to highlight allele-specific methylation and conserved transcriptomic patterns for genes and small RNA. Such a format is essential to integrate multi-omics data and reveals that, despite less clonal conservation than genetic polymorphisms, methylation marks are remarkably inherited. By further demonstrating the linear-reference limitations, we determine that the correct representation of genetic variants by the sequence graph is crucial for the accurate allelic quantification of the methylome. These findings reveal the remarkable stability of epigenetic marks in a model propagated by asexual reproduction. Using a phased sequence graph, we introduce a scalable framework that accounts for genomic variation, accurately quantifies allele-specific methylation, and supports multi-omics integration such as our evaluation of the transcriptional impact of epigenetic inheritance. This approach has broad implications for perennial crops, where epigenetic variation could influence traits relevant to breeding, adaptation, and long-term agricultural sustainability.
表观遗传修饰,如DNA甲基化,调节转录并影响关键的生物学性状。虽然人们做了很多努力来了解它们在一年生作物中的稳定性,但它们在无性繁殖植物中的长期持久性仍然知之甚少。葡萄(Vitis vinifera)提供了一个独特的模式,其栽培品种无性繁殖了几个世纪。在这里,我们组装了赤霞珠及其亲本品丽珠和长相思的分阶段基因组,使用高保真长读和比现有图谱密度高10倍的基因图谱。每个品种使用三个克隆,我们通过非常一致的短读和长读测序来量化甲基化,并确保品种代表性和克隆变异性评估。我们利用亲代序列图来突出等位基因特异性甲基化和基因和小RNA的保守转录组模式。这种格式对于整合多组学数据至关重要,并揭示了尽管克隆保守性低于遗传多态性,甲基化标记显着遗传。通过进一步证明线性参考的局限性,我们确定序列图对遗传变异的正确表示对于甲基组的精确等位基因定量至关重要。这些发现揭示了通过无性繁殖繁殖的模型中表观遗传标记的显著稳定性。使用相序列图,我们引入了一个可扩展的框架来解释基因组变异,准确量化等位基因特异性甲基化,并支持多组学整合,例如我们对表观遗传转录影响的评估。这种方法对多年生作物具有广泛的意义,表观遗传变异可能影响与育种、适应和长期农业可持续性相关的性状。
{"title":"Phased epigenomics and methylation inheritance in a historical Vitis vinifera hybrid","authors":"Noé Cochetel, Amanda M. Vondras, Rosa Figueroa-Balderas, Joel Liou, Paul Peluso, Dario Cantu","doi":"10.1186/s13059-025-03858-2","DOIUrl":"https://doi.org/10.1186/s13059-025-03858-2","url":null,"abstract":"Epigenetic modifications, such as DNA methylation, regulate transcription and influence key biological traits. While many efforts were made to understand their stability in annual crops, their long-term persistence in clonally propagated plants remains poorly understood. Grapevine (Vitis vinifera) provides a unique model, with cultivars vegetatively propagated for centuries. Here, we assemble the phased genomes of Cabernet Sauvignon and its parental lineages, Cabernet Franc and Sauvignon Blanc, using HiFi long-reads and a gene map tenfold denser than existing maps. Using three clones per cultivar, we quantify methylation with very consistent short- and long-read sequencing and ensure both varietal representativeness and assessment of clonal variability. We leverage the parent-progeny sequence graph to highlight allele-specific methylation and conserved transcriptomic patterns for genes and small RNA. Such a format is essential to integrate multi-omics data and reveals that, despite less clonal conservation than genetic polymorphisms, methylation marks are remarkably inherited. By further demonstrating the linear-reference limitations, we determine that the correct representation of genetic variants by the sequence graph is crucial for the accurate allelic quantification of the methylome. These findings reveal the remarkable stability of epigenetic marks in a model propagated by asexual reproduction. Using a phased sequence graph, we introduce a scalable framework that accounts for genomic variation, accurately quantifies allele-specific methylation, and supports multi-omics integration such as our evaluation of the transcriptional impact of epigenetic inheritance. This approach has broad implications for perennial crops, where epigenetic variation could influence traits relevant to breeding, adaptation, and long-term agricultural sustainability.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"22 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145531557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Double-stranded DNA deaminase DddAE1347A can increase the efficiency and targeting range of cytidine base editors 双链DNA脱氨酶DddAE1347A可以提高胞苷碱基编辑器的效率和靶向范围
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-17 DOI: 10.1186/s13059-025-03849-3
Yuqiang Qian, Fengjiao Hui, Wenchao Niu, Di Wang, Yang Hao, Qingying Meng, Siyu Ren, Deqiang Kong, Heng Gong, Jiayu Wu, Kexin Chen, Muna Alariqi, Junping Gao, Zhanjun Li, Shuangxia Jin
Cytidine base editors (CBEs) consist of a single-strand specific cytidine deaminase fused to Cas9 nickase, enabling efficient C-to-T conversion across diverse organisms. Enhancing editing range and efficiency of these tools is essential for expanding their applications. In this study, we report that fusing a double-stranded DNA-specific cytosine deaminase DddAE1347A to CBEs significantly improves editing activity and broadens the editing window in cell lines, embryos, tobacco, and cotton. Compared to BE4max, the optimized DddAE1347A-BE4max exhibits up to a 93- fold increase in editing efficiency, achieving up to 52% efficiency at C14 and C15 in cell lines. Further investigation reveals that DddAE1347A is compatible with various Cas9 variants (SpCas9, SpaCas9, and Nme2Cas9) and deaminase variants (rA1, A3G, and A3A). Additionally, we demonstrate that cytosine deaminases with single-stranded DNA activity fail to enhance the CBE system. In contrast, various DddA variants can improve CBE editing activity at PAM-proximal cytosine positions, highlighting the modularity of fusion between DddAs and CBEs. These findings suggest that the double-stranded DNA-specific cytosine deaminase protein can act as an engineered fusion module in the CBE system, altering the performance (window/efficiency) of CBEs.
胞苷碱基编辑器(CBEs)由与Cas9酶融合的单链特异性胞苷脱氨酶组成,可在不同生物中实现高效的c -t转化。增强这些工具的编辑范围和效率对于扩展它们的应用至关重要。在这项研究中,我们报道了将双链dna特异性胞嘧啶脱氨酶DddAE1347A与CBEs融合,显著提高了细胞系、胚胎、烟草和棉花的编辑活性,拓宽了编辑窗口。与BE4max相比,优化后的DddAE1347A-BE4max的编辑效率提高了93倍,在细胞系C14和C15上的编辑效率高达52%。进一步研究发现,DddAE1347A与多种Cas9变体(SpCas9、SpaCas9和Nme2Cas9)和脱氨酶变体(rA1、A3G和A3A)兼容。此外,我们证明具有单链DNA活性的胞嘧啶脱氨酶不能增强CBE系统。相反,各种DddA变体可以提高pam -近端胞嘧啶位置的CBE编辑活性,突出了DddA与CBE融合的模块化。这些发现表明,双链dna特异性胞嘧啶脱氨酶蛋白可以作为CBE系统中的工程融合模块,改变CBE的性能(窗口/效率)。
{"title":"Double-stranded DNA deaminase DddAE1347A can increase the efficiency and targeting range of cytidine base editors","authors":"Yuqiang Qian, Fengjiao Hui, Wenchao Niu, Di Wang, Yang Hao, Qingying Meng, Siyu Ren, Deqiang Kong, Heng Gong, Jiayu Wu, Kexin Chen, Muna Alariqi, Junping Gao, Zhanjun Li, Shuangxia Jin","doi":"10.1186/s13059-025-03849-3","DOIUrl":"https://doi.org/10.1186/s13059-025-03849-3","url":null,"abstract":"Cytidine base editors (CBEs) consist of a single-strand specific cytidine deaminase fused to Cas9 nickase, enabling efficient C-to-T conversion across diverse organisms. Enhancing editing range and efficiency of these tools is essential for expanding their applications. In this study, we report that fusing a double-stranded DNA-specific cytosine deaminase DddAE1347A to CBEs significantly improves editing activity and broadens the editing window in cell lines, embryos, tobacco, and cotton. Compared to BE4max, the optimized DddAE1347A-BE4max exhibits up to a 93- fold increase in editing efficiency, achieving up to 52% efficiency at C14 and C15 in cell lines. Further investigation reveals that DddAE1347A is compatible with various Cas9 variants (SpCas9, SpaCas9, and Nme2Cas9) and deaminase variants (rA1, A3G, and A3A). Additionally, we demonstrate that cytosine deaminases with single-stranded DNA activity fail to enhance the CBE system. In contrast, various DddA variants can improve CBE editing activity at PAM-proximal cytosine positions, highlighting the modularity of fusion between DddAs and CBEs. These findings suggest that the double-stranded DNA-specific cytosine deaminase protein can act as an engineered fusion module in the CBE system, altering the performance (window/efficiency) of CBEs.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"174 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145531558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
KegAlign: optimizing pairwise alignments with diagonal partitioning KegAlign:优化对角分区的成对对齐
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-17 DOI: 10.1186/s13059-025-03830-0
A. Burak Gulhan, Richard Burhans, Robert Harris, Mahmut Kandemir, Maximilian Haeussler, Anton Nekrutenko
Advances in sequencing and assembly allow the creation of thousands of genome assemblies. However, producing multiple alignments required for their analysis lags behind due to the time-consuming process of pairwise alignment, typically performed by the slow but sensitive tool lastZ. Here, we develop KegAlign, an optimized GPU-enabled pairwise aligner. KegAlign employs a novel diagonal partitioning parallelization strategy and leverages advanced GPU features. It can compute a human/mouse alignment in under 6 h on a GPU-containing node without pre-partitioning, maintaining lastZ-level sensitivity crucial for divergent genomes. KegAlign is available as source code, a Conda package, and a user-friendly Galaxy workflow.
测序和组装技术的进步使得成千上万的基因组组装得以创建。然而,由于成对校准的耗时过程,通常由缓慢但敏感的工具lastZ执行,因此产生分析所需的多个校准滞后。在这里,我们开发了KegAlign,一个优化的支持gpu的成对对齐器。KegAlign采用了一种新颖的对角分区并行化策略,并利用了先进的GPU特性。它可以在6小时内在包含gpu的节点上计算人/鼠比对,而无需预分区,保持对不同基因组至关重要的lastz级灵敏度。KegAlign以源代码、Conda包和用户友好的Galaxy工作流的形式提供。
{"title":"KegAlign: optimizing pairwise alignments with diagonal partitioning","authors":"A. Burak Gulhan, Richard Burhans, Robert Harris, Mahmut Kandemir, Maximilian Haeussler, Anton Nekrutenko","doi":"10.1186/s13059-025-03830-0","DOIUrl":"https://doi.org/10.1186/s13059-025-03830-0","url":null,"abstract":"Advances in sequencing and assembly allow the creation of thousands of genome assemblies. However, producing multiple alignments required for their analysis lags behind due to the time-consuming process of pairwise alignment, typically performed by the slow but sensitive tool lastZ. Here, we develop KegAlign, an optimized GPU-enabled pairwise aligner. KegAlign employs a novel diagonal partitioning parallelization strategy and leverages advanced GPU features. It can compute a human/mouse alignment in under 6 h on a GPU-containing node without pre-partitioning, maintaining lastZ-level sensitivity crucial for divergent genomes. KegAlign is available as source code, a Conda package, and a user-friendly Galaxy workflow.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"1 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145531560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Genome Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1