Pub Date : 2025-11-20DOI: 10.1186/s13059-025-03860-8
Weiming Yu, Zhuobin Chen, Yaohua Hu, Jing Qin, Le Ou-Yang
Inferring gene regulatory networks (GRNs) is essential for understanding biological regulation. Although numerous deep learning approaches have been developed for GRN inference, most require large amounts of labeled data. We present Meta-TGLink, a structure-enhanced graph meta-learning model for few-shot GRN inference. By formulating GRN inference as a link prediction task, Meta-TGLink captures transferable regulatory patterns while reducing dependence on extensive labeled datasets. The model combines graph neural networks with Transformer architectures to integrate relational and positional information, thereby improving predictive performance under data-scarce conditions. Experiments on real datasets demonstrate its superiority over state-of-the-art baselines, particularly in cross-domain few-shot scenarios.
{"title":"Structure-enhanced graph meta learning for few-shot gene regulatory network inference","authors":"Weiming Yu, Zhuobin Chen, Yaohua Hu, Jing Qin, Le Ou-Yang","doi":"10.1186/s13059-025-03860-8","DOIUrl":"https://doi.org/10.1186/s13059-025-03860-8","url":null,"abstract":"Inferring gene regulatory networks (GRNs) is essential for understanding biological regulation. Although numerous deep learning approaches have been developed for GRN inference, most require large amounts of labeled data. We present Meta-TGLink, a structure-enhanced graph meta-learning model for few-shot GRN inference. By formulating GRN inference as a link prediction task, Meta-TGLink captures transferable regulatory patterns while reducing dependence on extensive labeled datasets. The model combines graph neural networks with Transformer architectures to integrate relational and positional information, thereby improving predictive performance under data-scarce conditions. Experiments on real datasets demonstrate its superiority over state-of-the-art baselines, particularly in cross-domain few-shot scenarios.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"135 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145554407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To reach a genomic scale illustration for linkage disequilibrium (LD), we introduce X-LDR, a stochastic algorithm for biobank-scale data ( $$mathcal {O}(nmB)$$ , n the sample size, m the number of SNPs, and B iterations). X-LDR can scale the entire genome to high-resolution LD grids, such as nearly 9 million LD grids for UK Biobank ( $$n approx 300,000$$ and $$mapprox 4.2$$ million). Various characteristics of LD are discovered in terms of their biological annotation. We also present an unprecedented LD atlas for 25 reference populations that contours the diversity of interspecies LD. The algorithms have been implemented in C++.
{"title":"X-LDR: an atlas of linkage disequilibrium across species","authors":"Tian-Neng Zhu, Xing Huang, Meng-Yuan Yang, Guo-An Qi, Qi-Xin Zhang, Feng Lin, Wenjing Zhang, Zhe Zhang, Xin Jin, Hou-Feng Zheng, Hai-Ming Xu, Shizhou Yu, Guo-Bo Chen","doi":"10.1186/s13059-025-03863-5","DOIUrl":"https://doi.org/10.1186/s13059-025-03863-5","url":null,"abstract":"To reach a genomic scale illustration for linkage disequilibrium (LD), we introduce X-LDR, a stochastic algorithm for biobank-scale data ( $$mathcal {O}(nmB)$$ , n the sample size, m the number of SNPs, and B iterations). X-LDR can scale the entire genome to high-resolution LD grids, such as nearly 9 million LD grids for UK Biobank ( $$n approx 300,000$$ and $$mapprox 4.2$$ million). Various characteristics of LD are discovered in terms of their biological annotation. We also present an unprecedented LD atlas for 25 reference populations that contours the diversity of interspecies LD. The algorithms have been implemented in C++.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"28 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145554408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-20DOI: 10.1186/s13059-025-03847-5
Chenwei Tang, Quan Sun, Xinyue Zeng, Gang Li, Xiaoyu Yang, Fei Liu, Jinying Zhao, Yin Shen, Boxiang Liu, Jia Wen, Yun Li
Cell type-specific analysis is crucial for uncovering biological insights hidden in bulk tissue data, yet single-cell or single-nuclei approaches are often cost-prohibitive for large samples. We introduce EPIC-unmix, a novel two-step empirical Bayesian method combining reference single-cell/single-nuclei and bulk RNA-seq data to improve cell type-specific inference, accounting for the difference between reference and target datasets. Under comprehensive simulations, we demonstrate that EPIC-unmix outperforms alternative methods in accuracy. Applied to Alzheimer’s disease brain RNA-seq data, EPIC-unmix identifies multiple differentially expressed genes in a cell type-specific manner, and empowers cell type-specific eQTL analysis.
{"title":"Cell type-specific inference from bulk RNA-sequencing data by integrating single-cell reference profiles via EPIC-unmix","authors":"Chenwei Tang, Quan Sun, Xinyue Zeng, Gang Li, Xiaoyu Yang, Fei Liu, Jinying Zhao, Yin Shen, Boxiang Liu, Jia Wen, Yun Li","doi":"10.1186/s13059-025-03847-5","DOIUrl":"https://doi.org/10.1186/s13059-025-03847-5","url":null,"abstract":"Cell type-specific analysis is crucial for uncovering biological insights hidden in bulk tissue data, yet single-cell or single-nuclei approaches are often cost-prohibitive for large samples. We introduce EPIC-unmix, a novel two-step empirical Bayesian method combining reference single-cell/single-nuclei and bulk RNA-seq data to improve cell type-specific inference, accounting for the difference between reference and target datasets. Under comprehensive simulations, we demonstrate that EPIC-unmix outperforms alternative methods in accuracy. Applied to Alzheimer’s disease brain RNA-seq data, EPIC-unmix identifies multiple differentially expressed genes in a cell type-specific manner, and empowers cell type-specific eQTL analysis.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"19 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145554406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-20DOI: 10.1186/s13059-025-03866-2
Clemens Schächter, Maren Hackenberg, Martin Treppner, Hanne Raum, Joschka Bödecker, Harald Binder
Animals can provide meaningful context for human single-cell data. To transfer information between species, we propose a deep learning approach that pre-trains a conditional variational autoencoder on animal data and transfers its final encoder layers to a human network architecture. Our approach then aligns latent spaces by leveraging data-level and model-learned similarities. We utilize this for label transfer and differential gene expression analysis in cross-species pairs of liver, adipose tissue, and glioblastoma datasets. Our results are robust even when gene sets differ, or datasets are small. Thus, we reliably exploit similarities between species to provide context for human single-cell data.
{"title":"scSpecies: enhancement of network architecture alignment in comparative single-cell studies","authors":"Clemens Schächter, Maren Hackenberg, Martin Treppner, Hanne Raum, Joschka Bödecker, Harald Binder","doi":"10.1186/s13059-025-03866-2","DOIUrl":"https://doi.org/10.1186/s13059-025-03866-2","url":null,"abstract":"Animals can provide meaningful context for human single-cell data. To transfer information between species, we propose a deep learning approach that pre-trains a conditional variational autoencoder on animal data and transfers its final encoder layers to a human network architecture. Our approach then aligns latent spaces by leveraging data-level and model-learned similarities. We utilize this for label transfer and differential gene expression analysis in cross-species pairs of liver, adipose tissue, and glioblastoma datasets. Our results are robust even when gene sets differ, or datasets are small. Thus, we reliably exploit similarities between species to provide context for human single-cell data.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"8 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145554475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17DOI: 10.1186/s13059-025-03858-2
Noé Cochetel, Amanda M. Vondras, Rosa Figueroa-Balderas, Joel Liou, Paul Peluso, Dario Cantu
Epigenetic modifications, such as DNA methylation, regulate transcription and influence key biological traits. While many efforts were made to understand their stability in annual crops, their long-term persistence in clonally propagated plants remains poorly understood. Grapevine (Vitis vinifera) provides a unique model, with cultivars vegetatively propagated for centuries. Here, we assemble the phased genomes of Cabernet Sauvignon and its parental lineages, Cabernet Franc and Sauvignon Blanc, using HiFi long-reads and a gene map tenfold denser than existing maps. Using three clones per cultivar, we quantify methylation with very consistent short- and long-read sequencing and ensure both varietal representativeness and assessment of clonal variability. We leverage the parent-progeny sequence graph to highlight allele-specific methylation and conserved transcriptomic patterns for genes and small RNA. Such a format is essential to integrate multi-omics data and reveals that, despite less clonal conservation than genetic polymorphisms, methylation marks are remarkably inherited. By further demonstrating the linear-reference limitations, we determine that the correct representation of genetic variants by the sequence graph is crucial for the accurate allelic quantification of the methylome. These findings reveal the remarkable stability of epigenetic marks in a model propagated by asexual reproduction. Using a phased sequence graph, we introduce a scalable framework that accounts for genomic variation, accurately quantifies allele-specific methylation, and supports multi-omics integration such as our evaluation of the transcriptional impact of epigenetic inheritance. This approach has broad implications for perennial crops, where epigenetic variation could influence traits relevant to breeding, adaptation, and long-term agricultural sustainability.
{"title":"Phased epigenomics and methylation inheritance in a historical Vitis vinifera hybrid","authors":"Noé Cochetel, Amanda M. Vondras, Rosa Figueroa-Balderas, Joel Liou, Paul Peluso, Dario Cantu","doi":"10.1186/s13059-025-03858-2","DOIUrl":"https://doi.org/10.1186/s13059-025-03858-2","url":null,"abstract":"Epigenetic modifications, such as DNA methylation, regulate transcription and influence key biological traits. While many efforts were made to understand their stability in annual crops, their long-term persistence in clonally propagated plants remains poorly understood. Grapevine (Vitis vinifera) provides a unique model, with cultivars vegetatively propagated for centuries. Here, we assemble the phased genomes of Cabernet Sauvignon and its parental lineages, Cabernet Franc and Sauvignon Blanc, using HiFi long-reads and a gene map tenfold denser than existing maps. Using three clones per cultivar, we quantify methylation with very consistent short- and long-read sequencing and ensure both varietal representativeness and assessment of clonal variability. We leverage the parent-progeny sequence graph to highlight allele-specific methylation and conserved transcriptomic patterns for genes and small RNA. Such a format is essential to integrate multi-omics data and reveals that, despite less clonal conservation than genetic polymorphisms, methylation marks are remarkably inherited. By further demonstrating the linear-reference limitations, we determine that the correct representation of genetic variants by the sequence graph is crucial for the accurate allelic quantification of the methylome. These findings reveal the remarkable stability of epigenetic marks in a model propagated by asexual reproduction. Using a phased sequence graph, we introduce a scalable framework that accounts for genomic variation, accurately quantifies allele-specific methylation, and supports multi-omics integration such as our evaluation of the transcriptional impact of epigenetic inheritance. This approach has broad implications for perennial crops, where epigenetic variation could influence traits relevant to breeding, adaptation, and long-term agricultural sustainability.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"22 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145531557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17DOI: 10.1186/s13059-025-03849-3
Yuqiang Qian, Fengjiao Hui, Wenchao Niu, Di Wang, Yang Hao, Qingying Meng, Siyu Ren, Deqiang Kong, Heng Gong, Jiayu Wu, Kexin Chen, Muna Alariqi, Junping Gao, Zhanjun Li, Shuangxia Jin
Cytidine base editors (CBEs) consist of a single-strand specific cytidine deaminase fused to Cas9 nickase, enabling efficient C-to-T conversion across diverse organisms. Enhancing editing range and efficiency of these tools is essential for expanding their applications. In this study, we report that fusing a double-stranded DNA-specific cytosine deaminase DddAE1347A to CBEs significantly improves editing activity and broadens the editing window in cell lines, embryos, tobacco, and cotton. Compared to BE4max, the optimized DddAE1347A-BE4max exhibits up to a 93- fold increase in editing efficiency, achieving up to 52% efficiency at C14 and C15 in cell lines. Further investigation reveals that DddAE1347A is compatible with various Cas9 variants (SpCas9, SpaCas9, and Nme2Cas9) and deaminase variants (rA1, A3G, and A3A). Additionally, we demonstrate that cytosine deaminases with single-stranded DNA activity fail to enhance the CBE system. In contrast, various DddA variants can improve CBE editing activity at PAM-proximal cytosine positions, highlighting the modularity of fusion between DddAs and CBEs. These findings suggest that the double-stranded DNA-specific cytosine deaminase protein can act as an engineered fusion module in the CBE system, altering the performance (window/efficiency) of CBEs.
{"title":"Double-stranded DNA deaminase DddAE1347A can increase the efficiency and targeting range of cytidine base editors","authors":"Yuqiang Qian, Fengjiao Hui, Wenchao Niu, Di Wang, Yang Hao, Qingying Meng, Siyu Ren, Deqiang Kong, Heng Gong, Jiayu Wu, Kexin Chen, Muna Alariqi, Junping Gao, Zhanjun Li, Shuangxia Jin","doi":"10.1186/s13059-025-03849-3","DOIUrl":"https://doi.org/10.1186/s13059-025-03849-3","url":null,"abstract":"Cytidine base editors (CBEs) consist of a single-strand specific cytidine deaminase fused to Cas9 nickase, enabling efficient C-to-T conversion across diverse organisms. Enhancing editing range and efficiency of these tools is essential for expanding their applications. In this study, we report that fusing a double-stranded DNA-specific cytosine deaminase DddAE1347A to CBEs significantly improves editing activity and broadens the editing window in cell lines, embryos, tobacco, and cotton. Compared to BE4max, the optimized DddAE1347A-BE4max exhibits up to a 93- fold increase in editing efficiency, achieving up to 52% efficiency at C14 and C15 in cell lines. Further investigation reveals that DddAE1347A is compatible with various Cas9 variants (SpCas9, SpaCas9, and Nme2Cas9) and deaminase variants (rA1, A3G, and A3A). Additionally, we demonstrate that cytosine deaminases with single-stranded DNA activity fail to enhance the CBE system. In contrast, various DddA variants can improve CBE editing activity at PAM-proximal cytosine positions, highlighting the modularity of fusion between DddAs and CBEs. These findings suggest that the double-stranded DNA-specific cytosine deaminase protein can act as an engineered fusion module in the CBE system, altering the performance (window/efficiency) of CBEs.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"174 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145531558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17DOI: 10.1186/s13059-025-03830-0
A. Burak Gulhan, Richard Burhans, Robert Harris, Mahmut Kandemir, Maximilian Haeussler, Anton Nekrutenko
Advances in sequencing and assembly allow the creation of thousands of genome assemblies. However, producing multiple alignments required for their analysis lags behind due to the time-consuming process of pairwise alignment, typically performed by the slow but sensitive tool lastZ. Here, we develop KegAlign, an optimized GPU-enabled pairwise aligner. KegAlign employs a novel diagonal partitioning parallelization strategy and leverages advanced GPU features. It can compute a human/mouse alignment in under 6 h on a GPU-containing node without pre-partitioning, maintaining lastZ-level sensitivity crucial for divergent genomes. KegAlign is available as source code, a Conda package, and a user-friendly Galaxy workflow.
{"title":"KegAlign: optimizing pairwise alignments with diagonal partitioning","authors":"A. Burak Gulhan, Richard Burhans, Robert Harris, Mahmut Kandemir, Maximilian Haeussler, Anton Nekrutenko","doi":"10.1186/s13059-025-03830-0","DOIUrl":"https://doi.org/10.1186/s13059-025-03830-0","url":null,"abstract":"Advances in sequencing and assembly allow the creation of thousands of genome assemblies. However, producing multiple alignments required for their analysis lags behind due to the time-consuming process of pairwise alignment, typically performed by the slow but sensitive tool lastZ. Here, we develop KegAlign, an optimized GPU-enabled pairwise aligner. KegAlign employs a novel diagonal partitioning parallelization strategy and leverages advanced GPU features. It can compute a human/mouse alignment in under 6 h on a GPU-containing node without pre-partitioning, maintaining lastZ-level sensitivity crucial for divergent genomes. KegAlign is available as source code, a Conda package, and a user-friendly Galaxy workflow.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"1 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145531560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17DOI: 10.1186/s13059-025-03842-w
Runxin Gao, Minrong Guo, Shiping Luan, Weizhi Ouyang, Xingwang Li
Multivalent interactions between proteins with intrinsically disordered regions or prion-like domains can drive liquid–liquid phase separation (LLPS) and form membraneless condensates essential for diverse cellular functions. Here, we predict phase separation scores for all annotated rice proteins and present ricePSP ( https://ricepsp.github.io/ ), a database of phase separation-associated proteins. AlphaFold structural predictions further validate the phase separation potential of these proteins. As a proof of concept, we apply ricePSP to identify flowering-related phase separation proteins, revealing insights into how LLPS may regulate flowering. Collectively, ricePSP provides a valuable resource for studying crop phase separation proteins and LLPS-related mechanisms in crop trait regulation.
{"title":"ricePSP: a database of rice phase separation-associated proteins","authors":"Runxin Gao, Minrong Guo, Shiping Luan, Weizhi Ouyang, Xingwang Li","doi":"10.1186/s13059-025-03842-w","DOIUrl":"https://doi.org/10.1186/s13059-025-03842-w","url":null,"abstract":"Multivalent interactions between proteins with intrinsically disordered regions or prion-like domains can drive liquid–liquid phase separation (LLPS) and form membraneless condensates essential for diverse cellular functions. Here, we predict phase separation scores for all annotated rice proteins and present ricePSP ( https://ricepsp.github.io/ ), a database of phase separation-associated proteins. AlphaFold structural predictions further validate the phase separation potential of these proteins. As a proof of concept, we apply ricePSP to identify flowering-related phase separation proteins, revealing insights into how LLPS may regulate flowering. Collectively, ricePSP provides a valuable resource for studying crop phase separation proteins and LLPS-related mechanisms in crop trait regulation.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"4 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145531559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17DOI: 10.1186/s13059-025-03861-7
Ho-Jin Gwak, Mina Rho
Metabarcoding remains challenging due to incomplete taxonomic annotations and computationally intensive processes. We present DeepCOI, a large language model-based classifier pre-trained on seven million cytochrome c oxidase I gene sequences. DeepCOI enables fast and accurate taxonomic assignment across eight major phyla, achieving an AU-ROC of 0.958 and AU-PR of 0.897–outperforming existing methods while significantly reducing inference time. Additionally, DeepCOI demonstrates interpretability by identifying taxonomically informative sequence positions. By integrating large-scale datasets and self-supervised learning, DeepCOI enhances both the accuracy and efficiency of metabarcoding processes, providing a scalable solution for biodiversity assessment and environmental monitoring.
{"title":"DeepCOI: a large language model-driven framework for fast and accurate taxonomic assignment in animal metabarcoding","authors":"Ho-Jin Gwak, Mina Rho","doi":"10.1186/s13059-025-03861-7","DOIUrl":"https://doi.org/10.1186/s13059-025-03861-7","url":null,"abstract":"Metabarcoding remains challenging due to incomplete taxonomic annotations and computationally intensive processes. We present DeepCOI, a large language model-based classifier pre-trained on seven million cytochrome c oxidase I gene sequences. DeepCOI enables fast and accurate taxonomic assignment across eight major phyla, achieving an AU-ROC of 0.958 and AU-PR of 0.897–outperforming existing methods while significantly reducing inference time. Additionally, DeepCOI demonstrates interpretability by identifying taxonomically informative sequence positions. By integrating large-scale datasets and self-supervised learning, DeepCOI enhances both the accuracy and efficiency of metabarcoding processes, providing a scalable solution for biodiversity assessment and environmental monitoring.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"39 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145531556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17DOI: 10.1186/s13059-025-03840-y
Eric Kernfeld, Yunxiao Yang, Joshua S. Weinstock, Alexis Battle, Patrick Cahan
Diverse machine learning methods promise to forecast gene expression changes in response to novel genetic perturbations. However, these methods’ accuracy is not well characterized. We created a benchmarking platform that combines a panel of 11 large-scale perturbation datasets with an expression forecasting software engine that encompasses or interfaces to a wide variety of methods. We used our platform to assess methods, parameters, and sources of auxiliary data, finding that it is uncommon for expression forecasting methods to outperform simple baselines. Our platform will serve as a resource to improve methods and to identify contexts in which expression forecasting can succeed.
{"title":"A comparison of computational methods for expression forecasting","authors":"Eric Kernfeld, Yunxiao Yang, Joshua S. Weinstock, Alexis Battle, Patrick Cahan","doi":"10.1186/s13059-025-03840-y","DOIUrl":"https://doi.org/10.1186/s13059-025-03840-y","url":null,"abstract":"Diverse machine learning methods promise to forecast gene expression changes in response to novel genetic perturbations. However, these methods’ accuracy is not well characterized. We created a benchmarking platform that combines a panel of 11 large-scale perturbation datasets with an expression forecasting software engine that encompasses or interfaces to a wide variety of methods. We used our platform to assess methods, parameters, and sources of auxiliary data, finding that it is uncommon for expression forecasting methods to outperform simple baselines. Our platform will serve as a resource to improve methods and to identify contexts in which expression forecasting can succeed.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"28 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145531561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}