The rise of genomic sequencing raises privacy concerns due to the identifiable nature of genomic data. The GA4GH Beacon Project enables privacy-preserving data sharing but is vulnerable to membership inference attacks that reveal individual participation. Existing defenses, such as noise addition and query restrictions, rely on static policies that attackers can bypass. We introduce the first reinforcement learning (RL)-based dynamic defense for the beacon protocol, training defender and attacker agents in a multiplayer setting. Our approach adapts responses in real time, distinguishing users from adversaries and balancing privacy with utility against evolving threats.
{"title":"A reinforcement learning-based approach for dynamic privacy protection in genomic data sharing beacons","authors":"Masoud Poorghaffar Aghdam, Sobhan Shukueian Tabrizi, Kerem Ayöz, Erman Ayday, Sinem Sav, A. Ercüment Çiçek","doi":"10.1186/s13059-025-03871-5","DOIUrl":"https://doi.org/10.1186/s13059-025-03871-5","url":null,"abstract":"The rise of genomic sequencing raises privacy concerns due to the identifiable nature of genomic data. The GA4GH Beacon Project enables privacy-preserving data sharing but is vulnerable to membership inference attacks that reveal individual participation. Existing defenses, such as noise addition and query restrictions, rely on static policies that attackers can bypass. We introduce the first reinforcement learning (RL)-based dynamic defense for the beacon protocol, training defender and attacker agents in a multiplayer setting. Our approach adapts responses in real time, distinguishing users from adversaries and balancing privacy with utility against evolving threats.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"143 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145583714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-21DOI: 10.1186/s13059-025-03873-3
Karina Ray, Christina Mulch, Samuel M. Peterson, Sebastian Benjamin, Nathan Gullicksrud, Adam J. Ericsen, Eric J. Vallender, Betsy M. Ferguson, Jeffrey D. Wall, Benjamin N. Bimber
Due to their close evolutionary relationship with humans, rhesus macaques are an important pre-clinical model. While genetic diversity driven by short nucleotide variation has long been studied in rhesus macaques, there is comparatively little known about structural variation, with most published studies focused on cross-species comparative analyses. Understanding the degree and implications of intraspecies structural variation is essential to all biomedical research using rhesus macaques as a model. Here we present long-read sequencing of 59 rhesus macaques, identifying a catalog of 339,334 structural variants (SVs), which we subsequently genotype in a cohort of 2,645 individuals with short read whole genome sequencing data to create the largest public dataset of rhesus macaque SVs. These data reveal population structure within rhesus macaque SVs based on both geographic ancestry and to a lesser degree, breeding center. While there is evidence of strong purifying selection against SVs within exons, 0.7% of SVs overlap exons, with an average of 16.9 rare SVs per subject predicted to have a high impact on protein coding sequences. Notably, rhesus macaque SVs are dominated by Alu retrotransposition events, which comprise 55.7% of SVs and suggest significantly different modes of SV formation relative to humans and great apes. This dataset represents the largest study of structural variation in rhesus macaques to date and demonstrates use of both long and short-read datasets to generate SV genotype data. These data enable the consideration of structural variation impact in rhesus macaque-based research and will also aid the development of primate pangenomes.
{"title":"Long-read structural variant discovery and targeted short read genotyping enables population scale characterization of structural variation in rhesus macaques","authors":"Karina Ray, Christina Mulch, Samuel M. Peterson, Sebastian Benjamin, Nathan Gullicksrud, Adam J. Ericsen, Eric J. Vallender, Betsy M. Ferguson, Jeffrey D. Wall, Benjamin N. Bimber","doi":"10.1186/s13059-025-03873-3","DOIUrl":"https://doi.org/10.1186/s13059-025-03873-3","url":null,"abstract":"Due to their close evolutionary relationship with humans, rhesus macaques are an important pre-clinical model. While genetic diversity driven by short nucleotide variation has long been studied in rhesus macaques, there is comparatively little known about structural variation, with most published studies focused on cross-species comparative analyses. Understanding the degree and implications of intraspecies structural variation is essential to all biomedical research using rhesus macaques as a model. Here we present long-read sequencing of 59 rhesus macaques, identifying a catalog of 339,334 structural variants (SVs), which we subsequently genotype in a cohort of 2,645 individuals with short read whole genome sequencing data to create the largest public dataset of rhesus macaque SVs. These data reveal population structure within rhesus macaque SVs based on both geographic ancestry and to a lesser degree, breeding center. While there is evidence of strong purifying selection against SVs within exons, 0.7% of SVs overlap exons, with an average of 16.9 rare SVs per subject predicted to have a high impact on protein coding sequences. Notably, rhesus macaque SVs are dominated by Alu retrotransposition events, which comprise 55.7% of SVs and suggest significantly different modes of SV formation relative to humans and great apes. This dataset represents the largest study of structural variation in rhesus macaques to date and demonstrates use of both long and short-read datasets to generate SV genotype data. These data enable the consideration of structural variation impact in rhesus macaque-based research and will also aid the development of primate pangenomes.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"11 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145559410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Advancements in single-cell RNA sequencing have enabled the analysis of millions of cells, but integrating such data across samples and methods while mitigating batch effects remains challenging. Deep learning approaches address this by learning biologically conserved gene expression representations, yet systematic benchmarking of loss functions and integration performance is lacking. We evaluate 16 integration methods using a unified variational autoencoder framework, incorporating batch and cell-type information. Results reveal limitations in the single-cell integration benchmarking index (scIB) for preserving intra-cell-type information. To address this, we introduce a correlation-based loss function and enhance benchmarking metrics to better capture biological conservation. Using cell annotations from lung and breast atlases, our approach improves biological signal preservation. We propose a refined integration framework, scIB-E, and metrics that provide deeper insights into the integration process and offer guidance for advanced developments in integrating increasingly complex single-cell data. This benchmark highlights the potential of deep learning-based approaches for single-cell data integration, emphasizing the importance of biologically informed metrics and improved benchmarking strategies.
{"title":"Benchmarking deep learning methods for biologically conserved single-cell integration","authors":"Chenxin Yi, Jinyu Cheng, Jiajun Chen, Wanquan Liu, Junwei Liu, Yixue Li","doi":"10.1186/s13059-025-03869-z","DOIUrl":"https://doi.org/10.1186/s13059-025-03869-z","url":null,"abstract":"Advancements in single-cell RNA sequencing have enabled the analysis of millions of cells, but integrating such data across samples and methods while mitigating batch effects remains challenging. Deep learning approaches address this by learning biologically conserved gene expression representations, yet systematic benchmarking of loss functions and integration performance is lacking. We evaluate 16 integration methods using a unified variational autoencoder framework, incorporating batch and cell-type information. Results reveal limitations in the single-cell integration benchmarking index (scIB) for preserving intra-cell-type information. To address this, we introduce a correlation-based loss function and enhance benchmarking metrics to better capture biological conservation. Using cell annotations from lung and breast atlases, our approach improves biological signal preservation. We propose a refined integration framework, scIB-E, and metrics that provide deeper insights into the integration process and offer guidance for advanced developments in integrating increasingly complex single-cell data. This benchmark highlights the potential of deep learning-based approaches for single-cell data integration, emphasizing the importance of biologically informed metrics and improved benchmarking strategies.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"177 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145554403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-20DOI: 10.1186/s13059-025-03860-8
Weiming Yu, Zhuobin Chen, Yaohua Hu, Jing Qin, Le Ou-Yang
Inferring gene regulatory networks (GRNs) is essential for understanding biological regulation. Although numerous deep learning approaches have been developed for GRN inference, most require large amounts of labeled data. We present Meta-TGLink, a structure-enhanced graph meta-learning model for few-shot GRN inference. By formulating GRN inference as a link prediction task, Meta-TGLink captures transferable regulatory patterns while reducing dependence on extensive labeled datasets. The model combines graph neural networks with Transformer architectures to integrate relational and positional information, thereby improving predictive performance under data-scarce conditions. Experiments on real datasets demonstrate its superiority over state-of-the-art baselines, particularly in cross-domain few-shot scenarios.
{"title":"Structure-enhanced graph meta learning for few-shot gene regulatory network inference","authors":"Weiming Yu, Zhuobin Chen, Yaohua Hu, Jing Qin, Le Ou-Yang","doi":"10.1186/s13059-025-03860-8","DOIUrl":"https://doi.org/10.1186/s13059-025-03860-8","url":null,"abstract":"Inferring gene regulatory networks (GRNs) is essential for understanding biological regulation. Although numerous deep learning approaches have been developed for GRN inference, most require large amounts of labeled data. We present Meta-TGLink, a structure-enhanced graph meta-learning model for few-shot GRN inference. By formulating GRN inference as a link prediction task, Meta-TGLink captures transferable regulatory patterns while reducing dependence on extensive labeled datasets. The model combines graph neural networks with Transformer architectures to integrate relational and positional information, thereby improving predictive performance under data-scarce conditions. Experiments on real datasets demonstrate its superiority over state-of-the-art baselines, particularly in cross-domain few-shot scenarios.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"135 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145554407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To reach a genomic scale illustration for linkage disequilibrium (LD), we introduce X-LDR, a stochastic algorithm for biobank-scale data ( $$mathcal {O}(nmB)$$ , n the sample size, m the number of SNPs, and B iterations). X-LDR can scale the entire genome to high-resolution LD grids, such as nearly 9 million LD grids for UK Biobank ( $$n approx 300,000$$ and $$mapprox 4.2$$ million). Various characteristics of LD are discovered in terms of their biological annotation. We also present an unprecedented LD atlas for 25 reference populations that contours the diversity of interspecies LD. The algorithms have been implemented in C++.
{"title":"X-LDR: an atlas of linkage disequilibrium across species","authors":"Tian-Neng Zhu, Xing Huang, Meng-Yuan Yang, Guo-An Qi, Qi-Xin Zhang, Feng Lin, Wenjing Zhang, Zhe Zhang, Xin Jin, Hou-Feng Zheng, Hai-Ming Xu, Shizhou Yu, Guo-Bo Chen","doi":"10.1186/s13059-025-03863-5","DOIUrl":"https://doi.org/10.1186/s13059-025-03863-5","url":null,"abstract":"To reach a genomic scale illustration for linkage disequilibrium (LD), we introduce X-LDR, a stochastic algorithm for biobank-scale data ( $$mathcal {O}(nmB)$$ , n the sample size, m the number of SNPs, and B iterations). X-LDR can scale the entire genome to high-resolution LD grids, such as nearly 9 million LD grids for UK Biobank ( $$n approx 300,000$$ and $$mapprox 4.2$$ million). Various characteristics of LD are discovered in terms of their biological annotation. We also present an unprecedented LD atlas for 25 reference populations that contours the diversity of interspecies LD. The algorithms have been implemented in C++.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"28 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145554408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-20DOI: 10.1186/s13059-025-03847-5
Chenwei Tang, Quan Sun, Xinyue Zeng, Gang Li, Xiaoyu Yang, Fei Liu, Jinying Zhao, Yin Shen, Boxiang Liu, Jia Wen, Yun Li
Cell type-specific analysis is crucial for uncovering biological insights hidden in bulk tissue data, yet single-cell or single-nuclei approaches are often cost-prohibitive for large samples. We introduce EPIC-unmix, a novel two-step empirical Bayesian method combining reference single-cell/single-nuclei and bulk RNA-seq data to improve cell type-specific inference, accounting for the difference between reference and target datasets. Under comprehensive simulations, we demonstrate that EPIC-unmix outperforms alternative methods in accuracy. Applied to Alzheimer’s disease brain RNA-seq data, EPIC-unmix identifies multiple differentially expressed genes in a cell type-specific manner, and empowers cell type-specific eQTL analysis.
{"title":"Cell type-specific inference from bulk RNA-sequencing data by integrating single-cell reference profiles via EPIC-unmix","authors":"Chenwei Tang, Quan Sun, Xinyue Zeng, Gang Li, Xiaoyu Yang, Fei Liu, Jinying Zhao, Yin Shen, Boxiang Liu, Jia Wen, Yun Li","doi":"10.1186/s13059-025-03847-5","DOIUrl":"https://doi.org/10.1186/s13059-025-03847-5","url":null,"abstract":"Cell type-specific analysis is crucial for uncovering biological insights hidden in bulk tissue data, yet single-cell or single-nuclei approaches are often cost-prohibitive for large samples. We introduce EPIC-unmix, a novel two-step empirical Bayesian method combining reference single-cell/single-nuclei and bulk RNA-seq data to improve cell type-specific inference, accounting for the difference between reference and target datasets. Under comprehensive simulations, we demonstrate that EPIC-unmix outperforms alternative methods in accuracy. Applied to Alzheimer’s disease brain RNA-seq data, EPIC-unmix identifies multiple differentially expressed genes in a cell type-specific manner, and empowers cell type-specific eQTL analysis.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"19 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145554406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-20DOI: 10.1186/s13059-025-03866-2
Clemens Schächter, Maren Hackenberg, Martin Treppner, Hanne Raum, Joschka Bödecker, Harald Binder
Animals can provide meaningful context for human single-cell data. To transfer information between species, we propose a deep learning approach that pre-trains a conditional variational autoencoder on animal data and transfers its final encoder layers to a human network architecture. Our approach then aligns latent spaces by leveraging data-level and model-learned similarities. We utilize this for label transfer and differential gene expression analysis in cross-species pairs of liver, adipose tissue, and glioblastoma datasets. Our results are robust even when gene sets differ, or datasets are small. Thus, we reliably exploit similarities between species to provide context for human single-cell data.
{"title":"scSpecies: enhancement of network architecture alignment in comparative single-cell studies","authors":"Clemens Schächter, Maren Hackenberg, Martin Treppner, Hanne Raum, Joschka Bödecker, Harald Binder","doi":"10.1186/s13059-025-03866-2","DOIUrl":"https://doi.org/10.1186/s13059-025-03866-2","url":null,"abstract":"Animals can provide meaningful context for human single-cell data. To transfer information between species, we propose a deep learning approach that pre-trains a conditional variational autoencoder on animal data and transfers its final encoder layers to a human network architecture. Our approach then aligns latent spaces by leveraging data-level and model-learned similarities. We utilize this for label transfer and differential gene expression analysis in cross-species pairs of liver, adipose tissue, and glioblastoma datasets. Our results are robust even when gene sets differ, or datasets are small. Thus, we reliably exploit similarities between species to provide context for human single-cell data.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"8 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145554475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17DOI: 10.1186/s13059-025-03858-2
Noé Cochetel, Amanda M. Vondras, Rosa Figueroa-Balderas, Joel Liou, Paul Peluso, Dario Cantu
Epigenetic modifications, such as DNA methylation, regulate transcription and influence key biological traits. While many efforts were made to understand their stability in annual crops, their long-term persistence in clonally propagated plants remains poorly understood. Grapevine (Vitis vinifera) provides a unique model, with cultivars vegetatively propagated for centuries. Here, we assemble the phased genomes of Cabernet Sauvignon and its parental lineages, Cabernet Franc and Sauvignon Blanc, using HiFi long-reads and a gene map tenfold denser than existing maps. Using three clones per cultivar, we quantify methylation with very consistent short- and long-read sequencing and ensure both varietal representativeness and assessment of clonal variability. We leverage the parent-progeny sequence graph to highlight allele-specific methylation and conserved transcriptomic patterns for genes and small RNA. Such a format is essential to integrate multi-omics data and reveals that, despite less clonal conservation than genetic polymorphisms, methylation marks are remarkably inherited. By further demonstrating the linear-reference limitations, we determine that the correct representation of genetic variants by the sequence graph is crucial for the accurate allelic quantification of the methylome. These findings reveal the remarkable stability of epigenetic marks in a model propagated by asexual reproduction. Using a phased sequence graph, we introduce a scalable framework that accounts for genomic variation, accurately quantifies allele-specific methylation, and supports multi-omics integration such as our evaluation of the transcriptional impact of epigenetic inheritance. This approach has broad implications for perennial crops, where epigenetic variation could influence traits relevant to breeding, adaptation, and long-term agricultural sustainability.
{"title":"Phased epigenomics and methylation inheritance in a historical Vitis vinifera hybrid","authors":"Noé Cochetel, Amanda M. Vondras, Rosa Figueroa-Balderas, Joel Liou, Paul Peluso, Dario Cantu","doi":"10.1186/s13059-025-03858-2","DOIUrl":"https://doi.org/10.1186/s13059-025-03858-2","url":null,"abstract":"Epigenetic modifications, such as DNA methylation, regulate transcription and influence key biological traits. While many efforts were made to understand their stability in annual crops, their long-term persistence in clonally propagated plants remains poorly understood. Grapevine (Vitis vinifera) provides a unique model, with cultivars vegetatively propagated for centuries. Here, we assemble the phased genomes of Cabernet Sauvignon and its parental lineages, Cabernet Franc and Sauvignon Blanc, using HiFi long-reads and a gene map tenfold denser than existing maps. Using three clones per cultivar, we quantify methylation with very consistent short- and long-read sequencing and ensure both varietal representativeness and assessment of clonal variability. We leverage the parent-progeny sequence graph to highlight allele-specific methylation and conserved transcriptomic patterns for genes and small RNA. Such a format is essential to integrate multi-omics data and reveals that, despite less clonal conservation than genetic polymorphisms, methylation marks are remarkably inherited. By further demonstrating the linear-reference limitations, we determine that the correct representation of genetic variants by the sequence graph is crucial for the accurate allelic quantification of the methylome. These findings reveal the remarkable stability of epigenetic marks in a model propagated by asexual reproduction. Using a phased sequence graph, we introduce a scalable framework that accounts for genomic variation, accurately quantifies allele-specific methylation, and supports multi-omics integration such as our evaluation of the transcriptional impact of epigenetic inheritance. This approach has broad implications for perennial crops, where epigenetic variation could influence traits relevant to breeding, adaptation, and long-term agricultural sustainability.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"22 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145531557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17DOI: 10.1186/s13059-025-03849-3
Yuqiang Qian, Fengjiao Hui, Wenchao Niu, Di Wang, Yang Hao, Qingying Meng, Siyu Ren, Deqiang Kong, Heng Gong, Jiayu Wu, Kexin Chen, Muna Alariqi, Junping Gao, Zhanjun Li, Shuangxia Jin
Cytidine base editors (CBEs) consist of a single-strand specific cytidine deaminase fused to Cas9 nickase, enabling efficient C-to-T conversion across diverse organisms. Enhancing editing range and efficiency of these tools is essential for expanding their applications. In this study, we report that fusing a double-stranded DNA-specific cytosine deaminase DddAE1347A to CBEs significantly improves editing activity and broadens the editing window in cell lines, embryos, tobacco, and cotton. Compared to BE4max, the optimized DddAE1347A-BE4max exhibits up to a 93- fold increase in editing efficiency, achieving up to 52% efficiency at C14 and C15 in cell lines. Further investigation reveals that DddAE1347A is compatible with various Cas9 variants (SpCas9, SpaCas9, and Nme2Cas9) and deaminase variants (rA1, A3G, and A3A). Additionally, we demonstrate that cytosine deaminases with single-stranded DNA activity fail to enhance the CBE system. In contrast, various DddA variants can improve CBE editing activity at PAM-proximal cytosine positions, highlighting the modularity of fusion between DddAs and CBEs. These findings suggest that the double-stranded DNA-specific cytosine deaminase protein can act as an engineered fusion module in the CBE system, altering the performance (window/efficiency) of CBEs.
{"title":"Double-stranded DNA deaminase DddAE1347A can increase the efficiency and targeting range of cytidine base editors","authors":"Yuqiang Qian, Fengjiao Hui, Wenchao Niu, Di Wang, Yang Hao, Qingying Meng, Siyu Ren, Deqiang Kong, Heng Gong, Jiayu Wu, Kexin Chen, Muna Alariqi, Junping Gao, Zhanjun Li, Shuangxia Jin","doi":"10.1186/s13059-025-03849-3","DOIUrl":"https://doi.org/10.1186/s13059-025-03849-3","url":null,"abstract":"Cytidine base editors (CBEs) consist of a single-strand specific cytidine deaminase fused to Cas9 nickase, enabling efficient C-to-T conversion across diverse organisms. Enhancing editing range and efficiency of these tools is essential for expanding their applications. In this study, we report that fusing a double-stranded DNA-specific cytosine deaminase DddAE1347A to CBEs significantly improves editing activity and broadens the editing window in cell lines, embryos, tobacco, and cotton. Compared to BE4max, the optimized DddAE1347A-BE4max exhibits up to a 93- fold increase in editing efficiency, achieving up to 52% efficiency at C14 and C15 in cell lines. Further investigation reveals that DddAE1347A is compatible with various Cas9 variants (SpCas9, SpaCas9, and Nme2Cas9) and deaminase variants (rA1, A3G, and A3A). Additionally, we demonstrate that cytosine deaminases with single-stranded DNA activity fail to enhance the CBE system. In contrast, various DddA variants can improve CBE editing activity at PAM-proximal cytosine positions, highlighting the modularity of fusion between DddAs and CBEs. These findings suggest that the double-stranded DNA-specific cytosine deaminase protein can act as an engineered fusion module in the CBE system, altering the performance (window/efficiency) of CBEs.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"174 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145531558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17DOI: 10.1186/s13059-025-03830-0
A. Burak Gulhan, Richard Burhans, Robert Harris, Mahmut Kandemir, Maximilian Haeussler, Anton Nekrutenko
Advances in sequencing and assembly allow the creation of thousands of genome assemblies. However, producing multiple alignments required for their analysis lags behind due to the time-consuming process of pairwise alignment, typically performed by the slow but sensitive tool lastZ. Here, we develop KegAlign, an optimized GPU-enabled pairwise aligner. KegAlign employs a novel diagonal partitioning parallelization strategy and leverages advanced GPU features. It can compute a human/mouse alignment in under 6 h on a GPU-containing node without pre-partitioning, maintaining lastZ-level sensitivity crucial for divergent genomes. KegAlign is available as source code, a Conda package, and a user-friendly Galaxy workflow.
{"title":"KegAlign: optimizing pairwise alignments with diagonal partitioning","authors":"A. Burak Gulhan, Richard Burhans, Robert Harris, Mahmut Kandemir, Maximilian Haeussler, Anton Nekrutenko","doi":"10.1186/s13059-025-03830-0","DOIUrl":"https://doi.org/10.1186/s13059-025-03830-0","url":null,"abstract":"Advances in sequencing and assembly allow the creation of thousands of genome assemblies. However, producing multiple alignments required for their analysis lags behind due to the time-consuming process of pairwise alignment, typically performed by the slow but sensitive tool lastZ. Here, we develop KegAlign, an optimized GPU-enabled pairwise aligner. KegAlign employs a novel diagonal partitioning parallelization strategy and leverages advanced GPU features. It can compute a human/mouse alignment in under 6 h on a GPU-containing node without pre-partitioning, maintaining lastZ-level sensitivity crucial for divergent genomes. KegAlign is available as source code, a Conda package, and a user-friendly Galaxy workflow.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"1 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145531560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}