Pub Date : 2025-12-01DOI: 10.1186/s13059-025-03882-2
Youngbeen Moon, Young-Ho Kim, Jong-Kwang Kim, Chung Hwan Hong, Eun-Kyung Kang, Hye Won Choi, Dong-Eun Lee, Tae-Min Kim, Seong Gu Heo, Namshik Han, Kyeong-Man Hong
Background: Next-generation sequencing (NGS) has become an indispensable diagnostic tool across various diseases. However, sequencing and analysis errors remain major barriers to clinical implementation. In cancer diagnostics, detecting low-level somatic variants is particularly challenging due to tumor heterogeneity and contamination from normal cells.
Results: We assess targeted next-generation sequencing (T-NGS) performance using reference-standard DNA mixtures of homozygote hydatidiform mole and heterozygote blood DNA at varying ratios, analyzed by certified NGS providers. Analytical sensitivity differs by up to 13.9-fold, and false positive (FP) error rates vary up to 615-fold, depending on provider and pipeline. For identical raw data, DRAGEN and the in-house pipeline differ by up to 36.3-fold in FP error rates. Moderately recurrent FP-prone alleles, although representing only 5.37% of all FP sites, contribute to 36.7% of total FP errors in the Geninus in-house result. Among 22 discordant variant calls between DRAGEN and in-house analyses, more than half of them are not confirmed by single base extension assays, indicating likely false positives. Compared to DRAGEN, a conventional BWA + GATK Mutect2 pipeline maintains equivalent sensitivity but produces a 4-fold increase in FP errors, along with a notable enrichment of recurrent FP-prone alleles.
Conclusions: T-NGS results from certified providers exhibit substantial variability in both sensitivity and FP error rates. Conventional pipelines not only increase FP errors but also accumulate recurrent FP-prone alleles. These findings underscore the urgent need for standardized pipelines and rigorous quality control measures to ensure the reliability of T-NGS in clinical diagnostics.
{"title":"Evaluation of false positive and false negative errors in targeted next generation sequencing.","authors":"Youngbeen Moon, Young-Ho Kim, Jong-Kwang Kim, Chung Hwan Hong, Eun-Kyung Kang, Hye Won Choi, Dong-Eun Lee, Tae-Min Kim, Seong Gu Heo, Namshik Han, Kyeong-Man Hong","doi":"10.1186/s13059-025-03882-2","DOIUrl":"10.1186/s13059-025-03882-2","url":null,"abstract":"<p><strong>Background: </strong>Next-generation sequencing (NGS) has become an indispensable diagnostic tool across various diseases. However, sequencing and analysis errors remain major barriers to clinical implementation. In cancer diagnostics, detecting low-level somatic variants is particularly challenging due to tumor heterogeneity and contamination from normal cells.</p><p><strong>Results: </strong>We assess targeted next-generation sequencing (T-NGS) performance using reference-standard DNA mixtures of homozygote hydatidiform mole and heterozygote blood DNA at varying ratios, analyzed by certified NGS providers. Analytical sensitivity differs by up to 13.9-fold, and false positive (FP) error rates vary up to 615-fold, depending on provider and pipeline. For identical raw data, DRAGEN and the in-house pipeline differ by up to 36.3-fold in FP error rates. Moderately recurrent FP-prone alleles, although representing only 5.37% of all FP sites, contribute to 36.7% of total FP errors in the Geninus in-house result. Among 22 discordant variant calls between DRAGEN and in-house analyses, more than half of them are not confirmed by single base extension assays, indicating likely false positives. Compared to DRAGEN, a conventional BWA + GATK Mutect2 pipeline maintains equivalent sensitivity but produces a 4-fold increase in FP errors, along with a notable enrichment of recurrent FP-prone alleles.</p><p><strong>Conclusions: </strong>T-NGS results from certified providers exhibit substantial variability in both sensitivity and FP error rates. Conventional pipelines not only increase FP errors but also accumulate recurrent FP-prone alleles. These findings underscore the urgent need for standardized pipelines and rigorous quality control measures to ensure the reliability of T-NGS in clinical diagnostics.</p>","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"26 1","pages":"409"},"PeriodicalIF":10.1,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12670792/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145654125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1186/s13059-025-03877-z
Axel Jensen, Emma R Horton, Mardoché B Koko, Kate M Detwiler, Katerina Guschanski
{"title":"Holotype genome of the lesula provides insights into demography and evolution of a threatened primate lineage.","authors":"Axel Jensen, Emma R Horton, Mardoché B Koko, Kate M Detwiler, Katerina Guschanski","doi":"10.1186/s13059-025-03877-z","DOIUrl":"10.1186/s13059-025-03877-z","url":null,"abstract":"","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"26 1","pages":"408"},"PeriodicalIF":10.1,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12667059/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145654078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-29DOI: 10.1186/s13059-025-03886-y
Zixu Ran, Xudong Guo, Tong Pan, Yue Bi, Yi Hao, Heyun Sun, Jiangning Song, Fuyi Li
Background: Protein function research helps in understanding the complex biological processes that occur within cells. However, the intricate nature of protein structures and functions, along with the rapid growth of protein sequence data, presents a pressing challenge to develop efficient computational methods for accurate protein annotation.
Results: In this study, we propose ENGINE, a multi-channel deep learning framework designed for robust protein function prediction. ENGINE integrates an equivariant graph convolutional network model to capture geometric features from protein 3D structures, leverages the large language model ESM-C to encode evolutionary and sequence-derived information, and combines an innovative 3D sequence representation that unifies spatial and sequential signals. We demonstrate that ENGINE consistently surpasses current state-of-the-art methods across diverse protein function prediction benchmarks, demonstrating robust generalisation and high predictive accuracy. Beyond performance, ENGINE provides interpretable insights into key sequence features and structural motifs, enabling the identification of functionally critical residues and substructures within proteins. This facilitates a deeper mechanistic understanding of protein function annotation outcomes and supports hypothesis generation for downstream biological studies.
Conclusion: By offering reliable predictions with biological interpretability, ENGINE contributes to advancing research into cellular processes and disease mechanisms. The model is available at GitHub ( https://github.com/ABILiLab/ENGINE ) and Zenodo ( https://doi.org/10.5281/zenodo.17221153 ), serving as a valuable tool for the broader scientific community.
{"title":"A scalable equivariant graph network framework for precise protein function prediction.","authors":"Zixu Ran, Xudong Guo, Tong Pan, Yue Bi, Yi Hao, Heyun Sun, Jiangning Song, Fuyi Li","doi":"10.1186/s13059-025-03886-y","DOIUrl":"10.1186/s13059-025-03886-y","url":null,"abstract":"<p><strong>Background: </strong>Protein function research helps in understanding the complex biological processes that occur within cells. However, the intricate nature of protein structures and functions, along with the rapid growth of protein sequence data, presents a pressing challenge to develop efficient computational methods for accurate protein annotation.</p><p><strong>Results: </strong>In this study, we propose ENGINE, a multi-channel deep learning framework designed for robust protein function prediction. ENGINE integrates an equivariant graph convolutional network model to capture geometric features from protein 3D structures, leverages the large language model ESM-C to encode evolutionary and sequence-derived information, and combines an innovative 3D sequence representation that unifies spatial and sequential signals. We demonstrate that ENGINE consistently surpasses current state-of-the-art methods across diverse protein function prediction benchmarks, demonstrating robust generalisation and high predictive accuracy. Beyond performance, ENGINE provides interpretable insights into key sequence features and structural motifs, enabling the identification of functionally critical residues and substructures within proteins. This facilitates a deeper mechanistic understanding of protein function annotation outcomes and supports hypothesis generation for downstream biological studies.</p><p><strong>Conclusion: </strong>By offering reliable predictions with biological interpretability, ENGINE contributes to advancing research into cellular processes and disease mechanisms. The model is available at GitHub ( https://github.com/ABILiLab/ENGINE ) and Zenodo ( https://doi.org/10.5281/zenodo.17221153 ), serving as a valuable tool for the broader scientific community.</p>","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"26 1","pages":"407"},"PeriodicalIF":10.1,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12665208/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145632088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-28DOI: 10.1186/s13059-025-03878-y
Matthew A Knox, Valter Almeida, Gladys Kalema-Zikusoka, Stephen Rubanga, Alex Ngabirano, David T S Hayman
Background: Mitochondrial DNA is a key marker for assessing genetic diversity, critical for the conservation of endangered species. This study investigates the mitochondrial diversity of the Bwindi Impenetrable National Park (BINP) mountain gorilla population (Gorilla beringei beringei), one of the most endangered primate subspecies.
Results: Using pooled sequencing of 200 faecal samples collected from both habituated and wild gorillas, we identify ten mtDNA variants exceeding a 20% threshold across the population mitogenome. Comparisons with previously sequenced individual BINP gorilla mitogenomes corroborates these findings and reveals additional putative haplotypes, potential heteroplasmy and nuclear mitochondrial DNA segments. Our approach overcomes challenges associated with pooled samples, distinguishing sequencing noise from biological variation. The observed diversity suggests that mitochondrial variability in mountain gorillas is comparable to the higher levels reported in the closely related Grauer's gorilla (G. beringei graueri).
Conclusions: This study demonstrates the utility of non-invasive faecal sampling and pooled sequencing for assessing genetic diversity in challenging field conditions, highlighting its potential for population-level genetic monitoring of non-human primates. Our findings provide valuable insights into the genetic makeup of this critically endangered population, contributing to future conservation efforts, and supporting the recovery of mountain gorillas.
背景:线粒体DNA是评估遗传多样性的关键标记,对濒危物种的保护至关重要。本研究调查了Bwindi Impenetrable National Park (BINP)山地大猩猩种群(gorilla beringei beringei)的线粒体多样性,这是最濒危的灵长类亚种之一。结果:利用从驯化大猩猩和野生大猩猩收集的200个粪便样本的汇总测序,我们确定了10个mtDNA变异在种群有丝分裂基因组中超过20%的阈值。与先前测序的BINP大猩猩个体有丝分裂基因组的比较证实了这些发现,并揭示了其他假定的单倍型,潜在的异质性和核线粒体DNA片段。我们的方法克服了与混合样本相关的挑战,将测序噪声与生物变异区分开来。观察到的多样性表明,山地大猩猩的线粒体变异性与密切相关的格劳尔大猩猩(G. beringei graueri)的线粒体变异性相当。结论:本研究证明了非侵入性粪便取样和集合测序在具有挑战性的野外条件下评估遗传多样性的实用性,突出了其在非人类灵长类动物种群水平遗传监测中的潜力。我们的发现为这一极度濒危种群的基因组成提供了有价值的见解,有助于未来的保护工作,并支持山地大猩猩的恢复。
{"title":"Mitochondrial diversity of Bwindi Impenetrable National Park Mountain Gorillas.","authors":"Matthew A Knox, Valter Almeida, Gladys Kalema-Zikusoka, Stephen Rubanga, Alex Ngabirano, David T S Hayman","doi":"10.1186/s13059-025-03878-y","DOIUrl":"10.1186/s13059-025-03878-y","url":null,"abstract":"<p><strong>Background: </strong>Mitochondrial DNA is a key marker for assessing genetic diversity, critical for the conservation of endangered species. This study investigates the mitochondrial diversity of the Bwindi Impenetrable National Park (BINP) mountain gorilla population (Gorilla beringei beringei), one of the most endangered primate subspecies.</p><p><strong>Results: </strong>Using pooled sequencing of 200 faecal samples collected from both habituated and wild gorillas, we identify ten mtDNA variants exceeding a 20% threshold across the population mitogenome. Comparisons with previously sequenced individual BINP gorilla mitogenomes corroborates these findings and reveals additional putative haplotypes, potential heteroplasmy and nuclear mitochondrial DNA segments. Our approach overcomes challenges associated with pooled samples, distinguishing sequencing noise from biological variation. The observed diversity suggests that mitochondrial variability in mountain gorillas is comparable to the higher levels reported in the closely related Grauer's gorilla (G. beringei graueri).</p><p><strong>Conclusions: </strong>This study demonstrates the utility of non-invasive faecal sampling and pooled sequencing for assessing genetic diversity in challenging field conditions, highlighting its potential for population-level genetic monitoring of non-human primates. Our findings provide valuable insights into the genetic makeup of this critically endangered population, contributing to future conservation efforts, and supporting the recovery of mountain gorillas.</p>","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"26 1","pages":"405"},"PeriodicalIF":10.1,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12661816/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145632182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-26DOI: 10.1186/s13059-025-03883-1
Ming-Ching C Wen, Joshua D Welch
Transposable elements (TEs) are often epigenetically repressed in eukaryotic cells, but still affect the molecular state of the cell in certain contexts. A flurry of recent studies have elucidated new effects of TE sequences in eukaryotic cells. We review these emerging molecular effects of TEs, including a variety of new mechanisms by which TE sequences affect the cell, including pre- and post-transcriptional regulation of gene expression; cell-to-cell transmission of genes within a multicellular organism through virus-like activity; and RNA-guided DNA insertion. Recent demonstration of TE-guided genome editing underscores the importance of these investigations for both basic and translational research. Future work is needed to continue to unravel the molecular effects of TE sequences across developmental stages, across cell types, and in various diseases.
{"title":"Molecular effects of transposable element sequences in mammalian cells.","authors":"Ming-Ching C Wen, Joshua D Welch","doi":"10.1186/s13059-025-03883-1","DOIUrl":"https://doi.org/10.1186/s13059-025-03883-1","url":null,"abstract":"<p><p>Transposable elements (TEs) are often epigenetically repressed in eukaryotic cells, but still affect the molecular state of the cell in certain contexts. A flurry of recent studies have elucidated new effects of TE sequences in eukaryotic cells. We review these emerging molecular effects of TEs, including a variety of new mechanisms by which TE sequences affect the cell, including pre- and post-transcriptional regulation of gene expression; cell-to-cell transmission of genes within a multicellular organism through virus-like activity; and RNA-guided DNA insertion. Recent demonstration of TE-guided genome editing underscores the importance of these investigations for both basic and translational research. Future work is needed to continue to unravel the molecular effects of TE sequences across developmental stages, across cell types, and in various diseases.</p>","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"26 1","pages":"403"},"PeriodicalIF":10.1,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12649098/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145632145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-26DOI: 10.1186/s13059-025-03812-2
William Cross, Iben Lyskjær, Christopher Davies, Abigail Bunkum, Ana Maia Rocha, Tom Lesluyes, Fernanda Amary, Roberto Tirabosco, Cristina Naceur-Lombardelli, Mariam Jamal-Hanjani, Charles Swanton, Nischalan Pillay, Simone Zaccaria, Adrienne M Flanagan, Peter Van Loo
Driver mutations in IDH1 and IDH2 are initiating events in the evolution of chondrosarcoma and several other cancer types. Here, we present evidence that mutant IDH1 is recurrently lost in metastatic central chondrosarcoma. This may reflect either relaxed positive selection for the mutant IDH1 locus, or negative selection for the hypermethylation phenotype later in tumor evolution. This finding highlights the challenge for therapeutic intervention by mutant IDH1 inhibitors in chondrosarcoma.
{"title":"Loss of IDH1 and IDH2 mutations during the evolution of metastatic chondrosarcoma.","authors":"William Cross, Iben Lyskjær, Christopher Davies, Abigail Bunkum, Ana Maia Rocha, Tom Lesluyes, Fernanda Amary, Roberto Tirabosco, Cristina Naceur-Lombardelli, Mariam Jamal-Hanjani, Charles Swanton, Nischalan Pillay, Simone Zaccaria, Adrienne M Flanagan, Peter Van Loo","doi":"10.1186/s13059-025-03812-2","DOIUrl":"https://doi.org/10.1186/s13059-025-03812-2","url":null,"abstract":"<p><p>Driver mutations in IDH1 and IDH2 are initiating events in the evolution of chondrosarcoma and several other cancer types. Here, we present evidence that mutant IDH1 is recurrently lost in metastatic central chondrosarcoma. This may reflect either relaxed positive selection for the mutant IDH1 locus, or negative selection for the hypermethylation phenotype later in tumor evolution. This finding highlights the challenge for therapeutic intervention by mutant IDH1 inhibitors in chondrosarcoma.</p>","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"26 1","pages":"404"},"PeriodicalIF":10.1,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12659308/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145632119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-24DOI: 10.1186/s13059-025-03854-6
Uberto Pozzoli, Diego Forni, Federica Arrigoni, Rachele Cagliani, Luca De Gioia, Manuela Sironi
Intrinsically disordered protein regions (IDRs) are implicated in diverse cellular processes in eukaryotes and, in these organisms, they cover up to 40% of the proteome. Surprisingly little is known about IDRs in bacterial proteomes. Specifically, a number of questions remain unanswered, such as the role of these regions in host–pathogen interactions, their adaptive potential and evolutionary trajectories, as well as their biophysical properties. Here we focus on Mycobacterium tuberculosis and take advantage of the fact that, due to its extreme epidemiological relevance, several large-scale analyses are available. After benchmarking different disorder prediction tools, we integrate multiple levels of biological information to show that IDR-containing proteins are involved in virulence, in the modulation of host immune response, and in lipid metabolism. Mycobacterium tuberculosis IDRs are fast evolving and poorly antigenic, and they display specific sequence-ensemble-function relationships. Conversely, human proteins that interact with Mycobacterium tuberculosis are evolutionary constrained, widely expressed, and highly connected in the human interactome map. This indicates that the classical arms race paradigm is not universal in host–pathogen interactions. We also extend analysis to 540 human-infecting bacteria and we underscore wide variations in IDR representation and conformational properties. Our data point to a role of IDRs in contributing to bacterial virulence, interaction with the human host, and control of immune responses. Although this awaits experimental validation, we suggest that Mycobacterium tuberculosis also uses IDRs to sense and interact with its environment. Herein, we provide a database of bacterial IDRs, together with relevant parameters, for public use.
{"title":"Mycobacterium tuberculosis uses intrinsically disordered, fast evolving proteins to interact with conserved host factors","authors":"Uberto Pozzoli, Diego Forni, Federica Arrigoni, Rachele Cagliani, Luca De Gioia, Manuela Sironi","doi":"10.1186/s13059-025-03854-6","DOIUrl":"https://doi.org/10.1186/s13059-025-03854-6","url":null,"abstract":"Intrinsically disordered protein regions (IDRs) are implicated in diverse cellular processes in eukaryotes and, in these organisms, they cover up to 40% of the proteome. Surprisingly little is known about IDRs in bacterial proteomes. Specifically, a number of questions remain unanswered, such as the role of these regions in host–pathogen interactions, their adaptive potential and evolutionary trajectories, as well as their biophysical properties. Here we focus on Mycobacterium tuberculosis and take advantage of the fact that, due to its extreme epidemiological relevance, several large-scale analyses are available. After benchmarking different disorder prediction tools, we integrate multiple levels of biological information to show that IDR-containing proteins are involved in virulence, in the modulation of host immune response, and in lipid metabolism. Mycobacterium tuberculosis IDRs are fast evolving and poorly antigenic, and they display specific sequence-ensemble-function relationships. Conversely, human proteins that interact with Mycobacterium tuberculosis are evolutionary constrained, widely expressed, and highly connected in the human interactome map. This indicates that the classical arms race paradigm is not universal in host–pathogen interactions. We also extend analysis to 540 human-infecting bacteria and we underscore wide variations in IDR representation and conformational properties. Our data point to a role of IDRs in contributing to bacterial virulence, interaction with the human host, and control of immune responses. Although this awaits experimental validation, we suggest that Mycobacterium tuberculosis also uses IDRs to sense and interact with its environment. Herein, we provide a database of bacterial IDRs, together with relevant parameters, for public use.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"112 1","pages":"387"},"PeriodicalIF":12.3,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145583715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
HGMT is a database designed to analyze, explore, and visualize gut microbiomes from diverse tumor types. We process metagenomic datasets from 18,630 stool samples across 37 tumor types, including 2,207 samples from immunotherapy-treated patients across 12 tumor types. HGMT provides an interactive portal for querying taxonomic and functional profiles, visualizing cross-dataset differential abundance taxa in tumors, and identifying their pan-tumor associations. Our analysis reveals the capability of gut microbiota in diagnosing gastrointestinal tumors and predicting immunotherapy response for non-small cell lung carcinoma. HGMT represents a valuable resource for investigating the roles of gut microbiota in tumors and immunotherapy response.
{"title":"HGMT: a database of human gut microbiota for tumors and immunotherapy response","authors":"Jinxin Liu, Mingyu Wang, Chentao Xu, Longhao Jia, Senying Lai, Zi-Chao Zhang, Jinglong Zhang, Wei-Hua Chen, Yucheng T. Yang, Xing-Ming Zhao","doi":"10.1186/s13059-025-03865-3","DOIUrl":"https://doi.org/10.1186/s13059-025-03865-3","url":null,"abstract":"HGMT is a database designed to analyze, explore, and visualize gut microbiomes from diverse tumor types. We process metagenomic datasets from 18,630 stool samples across 37 tumor types, including 2,207 samples from immunotherapy-treated patients across 12 tumor types. HGMT provides an interactive portal for querying taxonomic and functional profiles, visualizing cross-dataset differential abundance taxa in tumors, and identifying their pan-tumor associations. Our analysis reveals the capability of gut microbiota in diagnosing gastrointestinal tumors and predicting immunotherapy response for non-small cell lung carcinoma. HGMT represents a valuable resource for investigating the roles of gut microbiota in tumors and immunotherapy response.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"223 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145583713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-24DOI: 10.1186/s13059-025-03862-6
Yang Li, Guanyu Qiao, Hongli Du, Xin Gao, Guohua Wang
Single-cell transcriptomics enables precise characterization of cellular heterogeneity, but current pre-trained models relying solely on expression data fail to capture gene associations. We present scKGBERT, a knowledge-enhanced foundation model integrating 41 M single-cell RNA-seq profiles and 8.9 M protein–protein interactions to jointly learn gene and cell representations. scKGBERT employs Gaussian attention to emphasize key genes and improve biomarker identification, achieving superior performance across gene annotation, drug response, and disease prediction tasks. scKGBERT enhances biological interpretability and offers a powerful resource for precision medicine and disease mechanism discovery.
{"title":"scKGBERT: a knowledge-enhanced foundation model for single-cell transcriptomics","authors":"Yang Li, Guanyu Qiao, Hongli Du, Xin Gao, Guohua Wang","doi":"10.1186/s13059-025-03862-6","DOIUrl":"https://doi.org/10.1186/s13059-025-03862-6","url":null,"abstract":"Single-cell transcriptomics enables precise characterization of cellular heterogeneity, but current pre-trained models relying solely on expression data fail to capture gene associations. We present scKGBERT, a knowledge-enhanced foundation model integrating 41 M single-cell RNA-seq profiles and 8.9 M protein–protein interactions to jointly learn gene and cell representations. scKGBERT employs Gaussian attention to emphasize key genes and improve biomarker identification, achieving superior performance across gene annotation, drug response, and disease prediction tasks. scKGBERT enhances biological interpretability and offers a powerful resource for precision medicine and disease mechanism discovery.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"100 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145583712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}