Genome Biology最新文献

英文中文

Evaluation of false positive and false negative errors in targeted next generation sequencing. 靶向下一代测序中假阳性和假阴性误差的评估。

IF 10.1 1区生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY

Genome Biology

Pub Date : 2025-12-01 DOI: 10.1186/s13059-025-03882-2

Youngbeen Moon, Young-Ho Kim, Jong-Kwang Kim, Chung Hwan Hong, Eun-Kyung Kang, Hye Won Choi, Dong-Eun Lee, Tae-Min Kim, Seong Gu Heo, Namshik Han, Kyeong-Man Hong

Background: Next-generation sequencing (NGS) has become an indispensable diagnostic tool across various diseases. However, sequencing and analysis errors remain major barriers to clinical implementation. In cancer diagnostics, detecting low-level somatic variants is particularly challenging due to tumor heterogeneity and contamination from normal cells.

Results: We assess targeted next-generation sequencing (T-NGS) performance using reference-standard DNA mixtures of homozygote hydatidiform mole and heterozygote blood DNA at varying ratios, analyzed by certified NGS providers. Analytical sensitivity differs by up to 13.9-fold, and false positive (FP) error rates vary up to 615-fold, depending on provider and pipeline. For identical raw data, DRAGEN and the in-house pipeline differ by up to 36.3-fold in FP error rates. Moderately recurrent FP-prone alleles, although representing only 5.37% of all FP sites, contribute to 36.7% of total FP errors in the Geninus in-house result. Among 22 discordant variant calls between DRAGEN and in-house analyses, more than half of them are not confirmed by single base extension assays, indicating likely false positives. Compared to DRAGEN, a conventional BWA + GATK Mutect2 pipeline maintains equivalent sensitivity but produces a 4-fold increase in FP errors, along with a notable enrichment of recurrent FP-prone alleles.

Conclusions: T-NGS results from certified providers exhibit substantial variability in both sensitivity and FP error rates. Conventional pipelines not only increase FP errors but also accumulate recurrent FP-prone alleles. These findings underscore the urgent need for standardized pipelines and rigorous quality control measures to ensure the reliability of T-NGS in clinical diagnostics.

背景：新一代测序（NGS）已成为各种疾病不可缺少的诊断工具。然而，测序和分析错误仍然是临床实施的主要障碍。在癌症诊断中，由于肿瘤的异质性和正常细胞的污染，检测低水平的体细胞变异尤其具有挑战性。结果：我们使用不同比例的纯合子葡萄胎和杂合子血DNA的参考标准DNA混合物来评估靶向下一代测序（T-NGS）的性能，并由经过认证的NGS提供商进行分析。根据供应商和管道的不同，分析灵敏度相差13.9倍，假阳性（FP）错误率相差615倍。对于相同的原始数据，DRAGEN和内部流水线在FP错误率上相差高达36.3倍。中度复发的FP易感等位基因，虽然只占所有FP位点的5.37%，但在Geninus内部结果中占总FP错误的36.7%。在DRAGEN和内部分析之间的22个不一致的变体呼叫中，超过一半的变体呼叫未被单碱基扩展分析证实，表明可能是假阳性。与DRAGEN相比，传统的BWA + GATK Mutect2管道保持相同的灵敏度，但产生4倍的FP错误，以及显著的重复FP易感性等位基因的富集。结论：来自认证供应商的T-NGS结果在敏感性和FP错误率上都表现出很大的差异。常规管道不仅增加了FP错误，而且积累了反复出现的FP易感性等位基因。这些发现强调迫切需要标准化的管道和严格的质量控制措施，以确保T-NGS在临床诊断中的可靠性。

{"title":"Evaluation of false positive and false negative errors in targeted next generation sequencing.","authors":"Youngbeen Moon, Young-Ho Kim, Jong-Kwang Kim, Chung Hwan Hong, Eun-Kyung Kang, Hye Won Choi, Dong-Eun Lee, Tae-Min Kim, Seong Gu Heo, Namshik Han, Kyeong-Man Hong","doi":"10.1186/s13059-025-03882-2","DOIUrl":"10.1186/s13059-025-03882-2","url":null,"abstract":"Background: Next-generation sequencing (NGS) has become an indispensable diagnostic tool across various diseases. However, sequencing and analysis errors remain major barriers to clinical implementation. In cancer diagnostics, detecting low-level somatic variants is particularly challenging due to tumor heterogeneity and contamination from normal cells.Results: We assess targeted next-generation sequencing (T-NGS) performance using reference-standard DNA mixtures of homozygote hydatidiform mole and heterozygote blood DNA at varying ratios, analyzed by certified NGS providers. Analytical sensitivity differs by up to 13.9-fold, and false positive (FP) error rates vary up to 615-fold, depending on provider and pipeline. For identical raw data, DRAGEN and the in-house pipeline differ by up to 36.3-fold in FP error rates. Moderately recurrent FP-prone alleles, although representing only 5.37% of all FP sites, contribute to 36.7% of total FP errors in the Geninus in-house result. Among 22 discordant variant calls between DRAGEN and in-house analyses, more than half of them are not confirmed by single base extension assays, indicating likely false positives. Compared to DRAGEN, a conventional BWA + GATK Mutect2 pipeline maintains equivalent sensitivity but produces a 4-fold increase in FP errors, along with a notable enrichment of recurrent FP-prone alleles.Conclusions: T-NGS results from certified providers exhibit substantial variability in both sensitivity and FP error rates. Conventional pipelines not only increase FP errors but also accumulate recurrent FP-prone alleles. These findings underscore the urgent need for standardized pipelines and rigorous quality control measures to ensure the reliability of T-NGS in clinical diagnostics.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"26 1","pages":"409"},"PeriodicalIF":10.1,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12670792/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145654125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Holotype genome of the lesula provides insights into demography and evolution of a threatened primate lineage. lesula的全型基因组提供了对受威胁灵长类谱系的人口学和进化的见解。

IF 10.1 1区生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY

Genome Biology

Pub Date : 2025-12-01 DOI: 10.1186/s13059-025-03877-z

Axel Jensen, Emma R Horton, Mardoché B Koko, Kate M Detwiler, Katerina Guschanski

引用次数: 0

A scalable equivariant graph network framework for precise protein function prediction. 蛋白质功能精确预测的可扩展等变图网络框架。

IF 10.1 1区生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY

Genome Biology

Pub Date : 2025-11-29 DOI: 10.1186/s13059-025-03886-y

Zixu Ran, Xudong Guo, Tong Pan, Yue Bi, Yi Hao, Heyun Sun, Jiangning Song, Fuyi Li

Background: Protein function research helps in understanding the complex biological processes that occur within cells. However, the intricate nature of protein structures and functions, along with the rapid growth of protein sequence data, presents a pressing challenge to develop efficient computational methods for accurate protein annotation.

Results: In this study, we propose ENGINE, a multi-channel deep learning framework designed for robust protein function prediction. ENGINE integrates an equivariant graph convolutional network model to capture geometric features from protein 3D structures, leverages the large language model ESM-C to encode evolutionary and sequence-derived information, and combines an innovative 3D sequence representation that unifies spatial and sequential signals. We demonstrate that ENGINE consistently surpasses current state-of-the-art methods across diverse protein function prediction benchmarks, demonstrating robust generalisation and high predictive accuracy. Beyond performance, ENGINE provides interpretable insights into key sequence features and structural motifs, enabling the identification of functionally critical residues and substructures within proteins. This facilitates a deeper mechanistic understanding of protein function annotation outcomes and supports hypothesis generation for downstream biological studies.

Conclusion: By offering reliable predictions with biological interpretability, ENGINE contributes to advancing research into cellular processes and disease mechanisms. The model is available at GitHub ( https://github.com/ABILiLab/ENGINE ) and Zenodo ( https://doi.org/10.5281/zenodo.17221153 ), serving as a valuable tool for the broader scientific community.

背景：蛋白质功能研究有助于理解细胞内发生的复杂生物过程。然而，蛋白质结构和功能的复杂性，以及蛋白质序列数据的快速增长，对开发有效的计算方法来准确注释蛋白质提出了紧迫的挑战。结果：在这项研究中，我们提出了ENGINE，一个多通道深度学习框架，设计用于鲁棒蛋白质功能预测。ENGINE集成了一个等变图卷积网络模型来捕获蛋白质3D结构的几何特征，利用大型语言模型ESM-C来编码进化和序列衍生信息，并结合了一个创新的3D序列表示，统一了空间和序列信号。我们证明了ENGINE在不同蛋白质功能预测基准上始终超越当前最先进的方法，展示了强大的泛化和高预测准确性。除了性能之外，ENGINE还提供了对关键序列特征和结构基序的可解释的见解，从而能够识别蛋白质中功能关键的残基和亚结构。这有助于对蛋白质功能注释结果进行更深入的机制理解，并支持下游生物学研究的假设生成。结论：通过提供具有生物学可解释性的可靠预测，ENGINE有助于推进细胞过程和疾病机制的研究。该模型可在GitHub （https://github.com/ABILiLab/ENGINE）和Zenodo （https://doi.org/10.5281/zenodo.17221153）上获得，作为更广泛的科学界的宝贵工具。

{"title":"A scalable equivariant graph network framework for precise protein function prediction.","authors":"Zixu Ran, Xudong Guo, Tong Pan, Yue Bi, Yi Hao, Heyun Sun, Jiangning Song, Fuyi Li","doi":"10.1186/s13059-025-03886-y","DOIUrl":"10.1186/s13059-025-03886-y","url":null,"abstract":"Background: Protein function research helps in understanding the complex biological processes that occur within cells. However, the intricate nature of protein structures and functions, along with the rapid growth of protein sequence data, presents a pressing challenge to develop efficient computational methods for accurate protein annotation.Results: In this study, we propose ENGINE, a multi-channel deep learning framework designed for robust protein function prediction. ENGINE integrates an equivariant graph convolutional network model to capture geometric features from protein 3D structures, leverages the large language model ESM-C to encode evolutionary and sequence-derived information, and combines an innovative 3D sequence representation that unifies spatial and sequential signals. We demonstrate that ENGINE consistently surpasses current state-of-the-art methods across diverse protein function prediction benchmarks, demonstrating robust generalisation and high predictive accuracy. Beyond performance, ENGINE provides interpretable insights into key sequence features and structural motifs, enabling the identification of functionally critical residues and substructures within proteins. This facilitates a deeper mechanistic understanding of protein function annotation outcomes and supports hypothesis generation for downstream biological studies.Conclusion: By offering reliable predictions with biological interpretability, ENGINE contributes to advancing research into cellular processes and disease mechanisms. The model is available at GitHub ( https://github.com/ABILiLab/ENGINE ) and Zenodo ( https://doi.org/10.5281/zenodo.17221153 ), serving as a valuable tool for the broader scientific community.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"26 1","pages":"407"},"PeriodicalIF":10.1,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12665208/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145632088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mitochondrial diversity of Bwindi Impenetrable National Park Mountain Gorillas. 布温迪密林国家公园山地大猩猩的线粒体多样性。

IF 10.1 1区生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY

Genome Biology

Pub Date : 2025-11-28 DOI: 10.1186/s13059-025-03878-y

Matthew A Knox, Valter Almeida, Gladys Kalema-Zikusoka, Stephen Rubanga, Alex Ngabirano, David T S Hayman

Background: Mitochondrial DNA is a key marker for assessing genetic diversity, critical for the conservation of endangered species. This study investigates the mitochondrial diversity of the Bwindi Impenetrable National Park (BINP) mountain gorilla population (Gorilla beringei beringei), one of the most endangered primate subspecies.

Results: Using pooled sequencing of 200 faecal samples collected from both habituated and wild gorillas, we identify ten mtDNA variants exceeding a 20% threshold across the population mitogenome. Comparisons with previously sequenced individual BINP gorilla mitogenomes corroborates these findings and reveals additional putative haplotypes, potential heteroplasmy and nuclear mitochondrial DNA segments. Our approach overcomes challenges associated with pooled samples, distinguishing sequencing noise from biological variation. The observed diversity suggests that mitochondrial variability in mountain gorillas is comparable to the higher levels reported in the closely related Grauer's gorilla (G. beringei graueri).

Conclusions: This study demonstrates the utility of non-invasive faecal sampling and pooled sequencing for assessing genetic diversity in challenging field conditions, highlighting its potential for population-level genetic monitoring of non-human primates. Our findings provide valuable insights into the genetic makeup of this critically endangered population, contributing to future conservation efforts, and supporting the recovery of mountain gorillas.

背景：线粒体DNA是评估遗传多样性的关键标记，对濒危物种的保护至关重要。本研究调查了Bwindi Impenetrable National Park （BINP）山地大猩猩种群（gorilla beringei beringei）的线粒体多样性，这是最濒危的灵长类亚种之一。结果：利用从驯化大猩猩和野生大猩猩收集的200个粪便样本的汇总测序，我们确定了10个mtDNA变异在种群有丝分裂基因组中超过20%的阈值。与先前测序的BINP大猩猩个体有丝分裂基因组的比较证实了这些发现，并揭示了其他假定的单倍型，潜在的异质性和核线粒体DNA片段。我们的方法克服了与混合样本相关的挑战，将测序噪声与生物变异区分开来。观察到的多样性表明，山地大猩猩的线粒体变异性与密切相关的格劳尔大猩猩（G. beringei graueri）的线粒体变异性相当。结论：本研究证明了非侵入性粪便取样和集合测序在具有挑战性的野外条件下评估遗传多样性的实用性，突出了其在非人类灵长类动物种群水平遗传监测中的潜力。我们的发现为这一极度濒危种群的基因组成提供了有价值的见解，有助于未来的保护工作，并支持山地大猩猩的恢复。

{"title":"Mitochondrial diversity of Bwindi Impenetrable National Park Mountain Gorillas.","authors":"Matthew A Knox, Valter Almeida, Gladys Kalema-Zikusoka, Stephen Rubanga, Alex Ngabirano, David T S Hayman","doi":"10.1186/s13059-025-03878-y","DOIUrl":"10.1186/s13059-025-03878-y","url":null,"abstract":"Background: Mitochondrial DNA is a key marker for assessing genetic diversity, critical for the conservation of endangered species. This study investigates the mitochondrial diversity of the Bwindi Impenetrable National Park (BINP) mountain gorilla population (Gorilla beringei beringei), one of the most endangered primate subspecies.Results: Using pooled sequencing of 200 faecal samples collected from both habituated and wild gorillas, we identify ten mtDNA variants exceeding a 20% threshold across the population mitogenome. Comparisons with previously sequenced individual BINP gorilla mitogenomes corroborates these findings and reveals additional putative haplotypes, potential heteroplasmy and nuclear mitochondrial DNA segments. Our approach overcomes challenges associated with pooled samples, distinguishing sequencing noise from biological variation. The observed diversity suggests that mitochondrial variability in mountain gorillas is comparable to the higher levels reported in the closely related Grauer's gorilla (G. beringei graueri).Conclusions: This study demonstrates the utility of non-invasive faecal sampling and pooled sequencing for assessing genetic diversity in challenging field conditions, highlighting its potential for population-level genetic monitoring of non-human primates. Our findings provide valuable insights into the genetic makeup of this critically endangered population, contributing to future conservation efforts, and supporting the recovery of mountain gorillas.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"26 1","pages":"405"},"PeriodicalIF":10.1,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12661816/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145632182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HAlign-G: rapid and low-memory multiple-genome aligner for large-scale closely related genomes. HAlign-G：快速和低记忆的多基因组比对器。

IF 10.1 1区生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY

Genome Biology

Pub Date : 2025-11-28 DOI: 10.1186/s13059-025-03881-3

Pinglu Zhang, Tong Zhou, Yanming Wei, Qinzhong Tian, Yixiao Zhai, Yizheng Wang, Quan Zou, Furong Tang, Ximei Luo

引用次数: 0

Molecular effects of transposable element sequences in mammalian cells. 转座因子序列在哺乳动物细胞中的分子效应。

IF 10.1 1区生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY

Genome Biology

Pub Date : 2025-11-26 DOI: 10.1186/s13059-025-03883-1

Ming-Ching C Wen, Joshua D Welch

Transposable elements (TEs) are often epigenetically repressed in eukaryotic cells, but still affect the molecular state of the cell in certain contexts. A flurry of recent studies have elucidated new effects of TE sequences in eukaryotic cells. We review these emerging molecular effects of TEs, including a variety of new mechanisms by which TE sequences affect the cell, including pre- and post-transcriptional regulation of gene expression; cell-to-cell transmission of genes within a multicellular organism through virus-like activity; and RNA-guided DNA insertion. Recent demonstration of TE-guided genome editing underscores the importance of these investigations for both basic and translational research. Future work is needed to continue to unravel the molecular effects of TE sequences across developmental stages, across cell types, and in various diseases.

转座因子（TEs）在真核细胞中经常被表观遗传抑制，但在某些情况下仍然影响细胞的分子状态。最近的一系列研究已经阐明了TE序列在真核细胞中的新作用。我们回顾了这些新兴的TE分子效应，包括TE序列影响细胞的各种新机制，包括基因表达的转录前和转录后调控；通过病毒样活动在多细胞生物体内进行基因的细胞间传播；以及rna引导的DNA插入。最近te引导的基因组编辑的演示强调了这些研究对基础研究和转化研究的重要性。未来的工作需要继续揭示TE序列在不同发育阶段、不同细胞类型和不同疾病中的分子效应。

引用次数: 0

Loss of IDH1 and IDH2 mutations during the evolution of metastatic chondrosarcoma. 在转移性软骨肉瘤的演变过程中IDH1和IDH2突变的丢失。

IF 10.1 1区生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY

Genome Biology

Pub Date : 2025-11-26 DOI: 10.1186/s13059-025-03812-2

William Cross, Iben Lyskjær, Christopher Davies, Abigail Bunkum, Ana Maia Rocha, Tom Lesluyes, Fernanda Amary, Roberto Tirabosco, Cristina Naceur-Lombardelli, Mariam Jamal-Hanjani, Charles Swanton, Nischalan Pillay, Simone Zaccaria, Adrienne M Flanagan, Peter Van Loo

Driver mutations in IDH1 and IDH2 are initiating events in the evolution of chondrosarcoma and several other cancer types. Here, we present evidence that mutant IDH1 is recurrently lost in metastatic central chondrosarcoma. This may reflect either relaxed positive selection for the mutant IDH1 locus, or negative selection for the hypermethylation phenotype later in tumor evolution. This finding highlights the challenge for therapeutic intervention by mutant IDH1 inhibitors in chondrosarcoma.

IDH1和IDH2的驱动突变是软骨肉瘤和其他几种癌症类型进化的起始事件。在这里，我们提出的证据表明，突变体IDH1在转移性中央软骨肉瘤中反复丢失。这可能反映了突变体IDH1位点的宽松阳性选择，或肿瘤进化后期超甲基化表型的负选择。这一发现强调了突变型IDH1抑制剂对软骨肉瘤治疗干预的挑战。

引用次数: 0

Mycobacterium tuberculosis uses intrinsically disordered, fast evolving proteins to interact with conserved host factors 结核分枝杆菌利用内在无序、快速进化的蛋白质与保守的宿主因子相互作用

IF 12.3 1区生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY

Genome Biology

Pub Date : 2025-11-24 DOI: 10.1186/s13059-025-03854-6

Uberto Pozzoli, Diego Forni, Federica Arrigoni, Rachele Cagliani, Luca De Gioia, Manuela Sironi

Intrinsically disordered protein regions (IDRs) are implicated in diverse cellular processes in eukaryotes and, in these organisms, they cover up to 40% of the proteome. Surprisingly little is known about IDRs in bacterial proteomes. Specifically, a number of questions remain unanswered, such as the role of these regions in host–pathogen interactions, their adaptive potential and evolutionary trajectories, as well as their biophysical properties. Here we focus on Mycobacterium tuberculosis and take advantage of the fact that, due to its extreme epidemiological relevance, several large-scale analyses are available. After benchmarking different disorder prediction tools, we integrate multiple levels of biological information to show that IDR-containing proteins are involved in virulence, in the modulation of host immune response, and in lipid metabolism. Mycobacterium tuberculosis IDRs are fast evolving and poorly antigenic, and they display specific sequence-ensemble-function relationships. Conversely, human proteins that interact with Mycobacterium tuberculosis are evolutionary constrained, widely expressed, and highly connected in the human interactome map. This indicates that the classical arms race paradigm is not universal in host–pathogen interactions. We also extend analysis to 540 human-infecting bacteria and we underscore wide variations in IDR representation and conformational properties. Our data point to a role of IDRs in contributing to bacterial virulence, interaction with the human host, and control of immune responses. Although this awaits experimental validation, we suggest that Mycobacterium tuberculosis also uses IDRs to sense and interact with its environment. Herein, we provide a database of bacterial IDRs, together with relevant parameters, for public use.

内在无序蛋白区（IDRs）与真核生物的多种细胞过程有关，在这些生物中，它们覆盖了高达40%的蛋白质组。令人惊讶的是，我们对细菌蛋白质组中的idr知之甚少。具体来说，许多问题仍未得到解答，例如这些区域在宿主-病原体相互作用中的作用，它们的适应潜力和进化轨迹，以及它们的生物物理特性。在这里，我们将重点放在结核分枝杆菌上，并利用这一事实，即由于其极端的流行病学相关性，可以进行几次大规模分析。在对不同的疾病预测工具进行基准测试后，我们整合了多个水平的生物信息，以表明含有idr的蛋白质参与了毒力、宿主免疫反应的调节和脂质代谢。结核分枝杆菌idr进化快，抗原性差，它们表现出特定的序列-集合-功能关系。相反，与结核分枝杆菌相互作用的人类蛋白在进化上受到限制，广泛表达，并在人类相互作用组图谱中高度关联。这表明经典的军备竞赛范式在宿主-病原体相互作用中并不普遍。我们还将分析扩展到540种人类感染细菌，并强调了IDR表示和构象特性的广泛差异。我们的数据表明IDRs在促进细菌毒力、与人类宿主相互作用和控制免疫反应方面的作用。虽然这有待实验验证，但我们认为结核分枝杆菌也使用idr来感知其环境并与之相互作用。在此，我们提供了一个细菌idr数据库，连同相关参数，供公众使用。

{"title":"Mycobacterium tuberculosis uses intrinsically disordered, fast evolving proteins to interact with conserved host factors","authors":"Uberto Pozzoli, Diego Forni, Federica Arrigoni, Rachele Cagliani, Luca De Gioia, Manuela Sironi","doi":"10.1186/s13059-025-03854-6","DOIUrl":"https://doi.org/10.1186/s13059-025-03854-6","url":null,"abstract":"Intrinsically disordered protein regions (IDRs) are implicated in diverse cellular processes in eukaryotes and, in these organisms, they cover up to 40% of the proteome. Surprisingly little is known about IDRs in bacterial proteomes. Specifically, a number of questions remain unanswered, such as the role of these regions in host–pathogen interactions, their adaptive potential and evolutionary trajectories, as well as their biophysical properties. Here we focus on Mycobacterium tuberculosis and take advantage of the fact that, due to its extreme epidemiological relevance, several large-scale analyses are available. After benchmarking different disorder prediction tools, we integrate multiple levels of biological information to show that IDR-containing proteins are involved in virulence, in the modulation of host immune response, and in lipid metabolism. Mycobacterium tuberculosis IDRs are fast evolving and poorly antigenic, and they display specific sequence-ensemble-function relationships. Conversely, human proteins that interact with Mycobacterium tuberculosis are evolutionary constrained, widely expressed, and highly connected in the human interactome map. This indicates that the classical arms race paradigm is not universal in host–pathogen interactions. We also extend analysis to 540 human-infecting bacteria and we underscore wide variations in IDR representation and conformational properties. Our data point to a role of IDRs in contributing to bacterial virulence, interaction with the human host, and control of immune responses. Although this awaits experimental validation, we suggest that Mycobacterium tuberculosis also uses IDRs to sense and interact with its environment. Herein, we provide a database of bacterial IDRs, together with relevant parameters, for public use.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"112 1","pages":"387"},"PeriodicalIF":12.3,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145583715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HGMT: a database of human gut microbiota for tumors and immunotherapy response HGMT：肿瘤和免疫治疗反应的人类肠道微生物群数据库

IF 12.3 1区生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY

Genome Biology

Pub Date : 2025-11-24 DOI: 10.1186/s13059-025-03865-3

Jinxin Liu, Mingyu Wang, Chentao Xu, Longhao Jia, Senying Lai, Zi-Chao Zhang, Jinglong Zhang, Wei-Hua Chen, Yucheng T. Yang, Xing-Ming Zhao

HGMT is a database designed to analyze, explore, and visualize gut microbiomes from diverse tumor types. We process metagenomic datasets from 18,630 stool samples across 37 tumor types, including 2,207 samples from immunotherapy-treated patients across 12 tumor types. HGMT provides an interactive portal for querying taxonomic and functional profiles, visualizing cross-dataset differential abundance taxa in tumors, and identifying their pan-tumor associations. Our analysis reveals the capability of gut microbiota in diagnosing gastrointestinal tumors and predicting immunotherapy response for non-small cell lung carcinoma. HGMT represents a valuable resource for investigating the roles of gut microbiota in tumors and immunotherapy response.

HGMT是一个旨在分析、探索和可视化不同肿瘤类型肠道微生物组的数据库。我们处理了来自37种肿瘤类型的18,630份粪便样本的宏基因组数据集，其中包括来自12种肿瘤类型的免疫治疗患者的2,207份样本。HGMT提供了一个交互式门户，用于查询分类和功能概况，可视化肿瘤中跨数据集差异丰度分类群，并确定其泛肿瘤关联。我们的分析揭示了肠道微生物群在诊断胃肠道肿瘤和预测非小细胞肺癌免疫治疗反应方面的能力。HGMT为研究肠道微生物群在肿瘤和免疫治疗反应中的作用提供了宝贵的资源。

引用次数: 0

scKGBERT: a knowledge-enhanced foundation model for single-cell transcriptomics scKGBERT：单细胞转录组学的知识增强基础模型

IF 12.3 1区生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY

Genome Biology

Pub Date : 2025-11-24 DOI: 10.1186/s13059-025-03862-6

Yang Li, Guanyu Qiao, Hongli Du, Xin Gao, Guohua Wang

Single-cell transcriptomics enables precise characterization of cellular heterogeneity, but current pre-trained models relying solely on expression data fail to capture gene associations. We present scKGBERT, a knowledge-enhanced foundation model integrating 41 M single-cell RNA-seq profiles and 8.9 M protein–protein interactions to jointly learn gene and cell representations. scKGBERT employs Gaussian attention to emphasize key genes and improve biomarker identification, achieving superior performance across gene annotation, drug response, and disease prediction tasks. scKGBERT enhances biological interpretability and offers a powerful resource for precision medicine and disease mechanism discovery.

单细胞转录组学能够精确表征细胞异质性，但目前仅依靠表达数据的预训练模型无法捕获基因关联。我们提出了scKGBERT，这是一个知识增强的基础模型，集成了41 M单细胞RNA-seq图谱和8.9 M蛋白质-蛋白质相互作用，以共同学习基因和细胞表征。scKGBERT采用高斯注意力来强调关键基因，提高生物标志物的识别，在基因注释、药物反应和疾病预测任务中取得了卓越的表现。scKGBERT提高了生物学的可解释性，为精准医学和疾病机制的发现提供了强大的资源。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Genome Biology

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀