Genome Biology最新文献

英文中文

Mitochondrial diversity of Bwindi Impenetrable National Park Mountain Gorillas. 布温迪密林国家公园山地大猩猩的线粒体多样性。

IF 10.1 1区生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY

Genome Biology

Pub Date : 2025-11-28 DOI: 10.1186/s13059-025-03878-y

Matthew A Knox, Valter Almeida, Gladys Kalema-Zikusoka, Stephen Rubanga, Alex Ngabirano, David T S Hayman

Background: Mitochondrial DNA is a key marker for assessing genetic diversity, critical for the conservation of endangered species. This study investigates the mitochondrial diversity of the Bwindi Impenetrable National Park (BINP) mountain gorilla population (Gorilla beringei beringei), one of the most endangered primate subspecies.

Results: Using pooled sequencing of 200 faecal samples collected from both habituated and wild gorillas, we identify ten mtDNA variants exceeding a 20% threshold across the population mitogenome. Comparisons with previously sequenced individual BINP gorilla mitogenomes corroborates these findings and reveals additional putative haplotypes, potential heteroplasmy and nuclear mitochondrial DNA segments. Our approach overcomes challenges associated with pooled samples, distinguishing sequencing noise from biological variation. The observed diversity suggests that mitochondrial variability in mountain gorillas is comparable to the higher levels reported in the closely related Grauer's gorilla (G. beringei graueri).

Conclusions: This study demonstrates the utility of non-invasive faecal sampling and pooled sequencing for assessing genetic diversity in challenging field conditions, highlighting its potential for population-level genetic monitoring of non-human primates. Our findings provide valuable insights into the genetic makeup of this critically endangered population, contributing to future conservation efforts, and supporting the recovery of mountain gorillas.

背景：线粒体DNA是评估遗传多样性的关键标记，对濒危物种的保护至关重要。本研究调查了Bwindi Impenetrable National Park （BINP）山地大猩猩种群（gorilla beringei beringei）的线粒体多样性，这是最濒危的灵长类亚种之一。结果：利用从驯化大猩猩和野生大猩猩收集的200个粪便样本的汇总测序，我们确定了10个mtDNA变异在种群有丝分裂基因组中超过20%的阈值。与先前测序的BINP大猩猩个体有丝分裂基因组的比较证实了这些发现，并揭示了其他假定的单倍型，潜在的异质性和核线粒体DNA片段。我们的方法克服了与混合样本相关的挑战，将测序噪声与生物变异区分开来。观察到的多样性表明，山地大猩猩的线粒体变异性与密切相关的格劳尔大猩猩（G. beringei graueri）的线粒体变异性相当。结论：本研究证明了非侵入性粪便取样和集合测序在具有挑战性的野外条件下评估遗传多样性的实用性，突出了其在非人类灵长类动物种群水平遗传监测中的潜力。我们的发现为这一极度濒危种群的基因组成提供了有价值的见解，有助于未来的保护工作，并支持山地大猩猩的恢复。

{"title":"Mitochondrial diversity of Bwindi Impenetrable National Park Mountain Gorillas.","authors":"Matthew A Knox, Valter Almeida, Gladys Kalema-Zikusoka, Stephen Rubanga, Alex Ngabirano, David T S Hayman","doi":"10.1186/s13059-025-03878-y","DOIUrl":"10.1186/s13059-025-03878-y","url":null,"abstract":"Background: Mitochondrial DNA is a key marker for assessing genetic diversity, critical for the conservation of endangered species. This study investigates the mitochondrial diversity of the Bwindi Impenetrable National Park (BINP) mountain gorilla population (Gorilla beringei beringei), one of the most endangered primate subspecies.Results: Using pooled sequencing of 200 faecal samples collected from both habituated and wild gorillas, we identify ten mtDNA variants exceeding a 20% threshold across the population mitogenome. Comparisons with previously sequenced individual BINP gorilla mitogenomes corroborates these findings and reveals additional putative haplotypes, potential heteroplasmy and nuclear mitochondrial DNA segments. Our approach overcomes challenges associated with pooled samples, distinguishing sequencing noise from biological variation. The observed diversity suggests that mitochondrial variability in mountain gorillas is comparable to the higher levels reported in the closely related Grauer's gorilla (G. beringei graueri).Conclusions: This study demonstrates the utility of non-invasive faecal sampling and pooled sequencing for assessing genetic diversity in challenging field conditions, highlighting its potential for population-level genetic monitoring of non-human primates. Our findings provide valuable insights into the genetic makeup of this critically endangered population, contributing to future conservation efforts, and supporting the recovery of mountain gorillas.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"26 1","pages":"405"},"PeriodicalIF":10.1,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12661816/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145632182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HAlign-G: rapid and low-memory multiple-genome aligner for large-scale closely related genomes. HAlign-G：快速和低记忆的多基因组比对器。

IF 10.1 1区生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY

Genome Biology

Pub Date : 2025-11-28 DOI: 10.1186/s13059-025-03881-3

Pinglu Zhang, Tong Zhou, Yanming Wei, Qinzhong Tian, Yixiao Zhai, Yizheng Wang, Quan Zou, Furong Tang, Ximei Luo

引用次数: 0

Molecular effects of transposable element sequences in mammalian cells. 转座因子序列在哺乳动物细胞中的分子效应。

IF 10.1 1区生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY

Genome Biology

Pub Date : 2025-11-26 DOI: 10.1186/s13059-025-03883-1

Ming-Ching C Wen, Joshua D Welch

Transposable elements (TEs) are often epigenetically repressed in eukaryotic cells, but still affect the molecular state of the cell in certain contexts. A flurry of recent studies have elucidated new effects of TE sequences in eukaryotic cells. We review these emerging molecular effects of TEs, including a variety of new mechanisms by which TE sequences affect the cell, including pre- and post-transcriptional regulation of gene expression; cell-to-cell transmission of genes within a multicellular organism through virus-like activity; and RNA-guided DNA insertion. Recent demonstration of TE-guided genome editing underscores the importance of these investigations for both basic and translational research. Future work is needed to continue to unravel the molecular effects of TE sequences across developmental stages, across cell types, and in various diseases.

转座因子（TEs）在真核细胞中经常被表观遗传抑制，但在某些情况下仍然影响细胞的分子状态。最近的一系列研究已经阐明了TE序列在真核细胞中的新作用。我们回顾了这些新兴的TE分子效应，包括TE序列影响细胞的各种新机制，包括基因表达的转录前和转录后调控；通过病毒样活动在多细胞生物体内进行基因的细胞间传播；以及rna引导的DNA插入。最近te引导的基因组编辑的演示强调了这些研究对基础研究和转化研究的重要性。未来的工作需要继续揭示TE序列在不同发育阶段、不同细胞类型和不同疾病中的分子效应。

引用次数: 0

Loss of IDH1 and IDH2 mutations during the evolution of metastatic chondrosarcoma. 在转移性软骨肉瘤的演变过程中IDH1和IDH2突变的丢失。

IF 10.1 1区生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY

Genome Biology

Pub Date : 2025-11-26 DOI: 10.1186/s13059-025-03812-2

William Cross, Iben Lyskjær, Christopher Davies, Abigail Bunkum, Ana Maia Rocha, Tom Lesluyes, Fernanda Amary, Roberto Tirabosco, Cristina Naceur-Lombardelli, Mariam Jamal-Hanjani, Charles Swanton, Nischalan Pillay, Simone Zaccaria, Adrienne M Flanagan, Peter Van Loo

Driver mutations in IDH1 and IDH2 are initiating events in the evolution of chondrosarcoma and several other cancer types. Here, we present evidence that mutant IDH1 is recurrently lost in metastatic central chondrosarcoma. This may reflect either relaxed positive selection for the mutant IDH1 locus, or negative selection for the hypermethylation phenotype later in tumor evolution. This finding highlights the challenge for therapeutic intervention by mutant IDH1 inhibitors in chondrosarcoma.

IDH1和IDH2的驱动突变是软骨肉瘤和其他几种癌症类型进化的起始事件。在这里，我们提出的证据表明，突变体IDH1在转移性中央软骨肉瘤中反复丢失。这可能反映了突变体IDH1位点的宽松阳性选择，或肿瘤进化后期超甲基化表型的负选择。这一发现强调了突变型IDH1抑制剂对软骨肉瘤治疗干预的挑战。

引用次数: 0

Mycobacterium tuberculosis uses intrinsically disordered, fast evolving proteins to interact with conserved host factors 结核分枝杆菌利用内在无序、快速进化的蛋白质与保守的宿主因子相互作用

IF 12.3 1区生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY

Genome Biology

Pub Date : 2025-11-24 DOI: 10.1186/s13059-025-03854-6

Uberto Pozzoli, Diego Forni, Federica Arrigoni, Rachele Cagliani, Luca De Gioia, Manuela Sironi

Intrinsically disordered protein regions (IDRs) are implicated in diverse cellular processes in eukaryotes and, in these organisms, they cover up to 40% of the proteome. Surprisingly little is known about IDRs in bacterial proteomes. Specifically, a number of questions remain unanswered, such as the role of these regions in host–pathogen interactions, their adaptive potential and evolutionary trajectories, as well as their biophysical properties. Here we focus on Mycobacterium tuberculosis and take advantage of the fact that, due to its extreme epidemiological relevance, several large-scale analyses are available. After benchmarking different disorder prediction tools, we integrate multiple levels of biological information to show that IDR-containing proteins are involved in virulence, in the modulation of host immune response, and in lipid metabolism. Mycobacterium tuberculosis IDRs are fast evolving and poorly antigenic, and they display specific sequence-ensemble-function relationships. Conversely, human proteins that interact with Mycobacterium tuberculosis are evolutionary constrained, widely expressed, and highly connected in the human interactome map. This indicates that the classical arms race paradigm is not universal in host–pathogen interactions. We also extend analysis to 540 human-infecting bacteria and we underscore wide variations in IDR representation and conformational properties. Our data point to a role of IDRs in contributing to bacterial virulence, interaction with the human host, and control of immune responses. Although this awaits experimental validation, we suggest that Mycobacterium tuberculosis also uses IDRs to sense and interact with its environment. Herein, we provide a database of bacterial IDRs, together with relevant parameters, for public use.

内在无序蛋白区（IDRs）与真核生物的多种细胞过程有关，在这些生物中，它们覆盖了高达40%的蛋白质组。令人惊讶的是，我们对细菌蛋白质组中的idr知之甚少。具体来说，许多问题仍未得到解答，例如这些区域在宿主-病原体相互作用中的作用，它们的适应潜力和进化轨迹，以及它们的生物物理特性。在这里，我们将重点放在结核分枝杆菌上，并利用这一事实，即由于其极端的流行病学相关性，可以进行几次大规模分析。在对不同的疾病预测工具进行基准测试后，我们整合了多个水平的生物信息，以表明含有idr的蛋白质参与了毒力、宿主免疫反应的调节和脂质代谢。结核分枝杆菌idr进化快，抗原性差，它们表现出特定的序列-集合-功能关系。相反，与结核分枝杆菌相互作用的人类蛋白在进化上受到限制，广泛表达，并在人类相互作用组图谱中高度关联。这表明经典的军备竞赛范式在宿主-病原体相互作用中并不普遍。我们还将分析扩展到540种人类感染细菌，并强调了IDR表示和构象特性的广泛差异。我们的数据表明IDRs在促进细菌毒力、与人类宿主相互作用和控制免疫反应方面的作用。虽然这有待实验验证，但我们认为结核分枝杆菌也使用idr来感知其环境并与之相互作用。在此，我们提供了一个细菌idr数据库，连同相关参数，供公众使用。

{"title":"Mycobacterium tuberculosis uses intrinsically disordered, fast evolving proteins to interact with conserved host factors","authors":"Uberto Pozzoli, Diego Forni, Federica Arrigoni, Rachele Cagliani, Luca De Gioia, Manuela Sironi","doi":"10.1186/s13059-025-03854-6","DOIUrl":"https://doi.org/10.1186/s13059-025-03854-6","url":null,"abstract":"Intrinsically disordered protein regions (IDRs) are implicated in diverse cellular processes in eukaryotes and, in these organisms, they cover up to 40% of the proteome. Surprisingly little is known about IDRs in bacterial proteomes. Specifically, a number of questions remain unanswered, such as the role of these regions in host–pathogen interactions, their adaptive potential and evolutionary trajectories, as well as their biophysical properties. Here we focus on Mycobacterium tuberculosis and take advantage of the fact that, due to its extreme epidemiological relevance, several large-scale analyses are available. After benchmarking different disorder prediction tools, we integrate multiple levels of biological information to show that IDR-containing proteins are involved in virulence, in the modulation of host immune response, and in lipid metabolism. Mycobacterium tuberculosis IDRs are fast evolving and poorly antigenic, and they display specific sequence-ensemble-function relationships. Conversely, human proteins that interact with Mycobacterium tuberculosis are evolutionary constrained, widely expressed, and highly connected in the human interactome map. This indicates that the classical arms race paradigm is not universal in host–pathogen interactions. We also extend analysis to 540 human-infecting bacteria and we underscore wide variations in IDR representation and conformational properties. Our data point to a role of IDRs in contributing to bacterial virulence, interaction with the human host, and control of immune responses. Although this awaits experimental validation, we suggest that Mycobacterium tuberculosis also uses IDRs to sense and interact with its environment. Herein, we provide a database of bacterial IDRs, together with relevant parameters, for public use.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"112 1","pages":"387"},"PeriodicalIF":12.3,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145583715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HGMT: a database of human gut microbiota for tumors and immunotherapy response HGMT：肿瘤和免疫治疗反应的人类肠道微生物群数据库

IF 12.3 1区生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY

Genome Biology

Pub Date : 2025-11-24 DOI: 10.1186/s13059-025-03865-3

Jinxin Liu, Mingyu Wang, Chentao Xu, Longhao Jia, Senying Lai, Zi-Chao Zhang, Jinglong Zhang, Wei-Hua Chen, Yucheng T. Yang, Xing-Ming Zhao

HGMT is a database designed to analyze, explore, and visualize gut microbiomes from diverse tumor types. We process metagenomic datasets from 18,630 stool samples across 37 tumor types, including 2,207 samples from immunotherapy-treated patients across 12 tumor types. HGMT provides an interactive portal for querying taxonomic and functional profiles, visualizing cross-dataset differential abundance taxa in tumors, and identifying their pan-tumor associations. Our analysis reveals the capability of gut microbiota in diagnosing gastrointestinal tumors and predicting immunotherapy response for non-small cell lung carcinoma. HGMT represents a valuable resource for investigating the roles of gut microbiota in tumors and immunotherapy response.

HGMT是一个旨在分析、探索和可视化不同肿瘤类型肠道微生物组的数据库。我们处理了来自37种肿瘤类型的18,630份粪便样本的宏基因组数据集，其中包括来自12种肿瘤类型的免疫治疗患者的2,207份样本。HGMT提供了一个交互式门户，用于查询分类和功能概况，可视化肿瘤中跨数据集差异丰度分类群，并确定其泛肿瘤关联。我们的分析揭示了肠道微生物群在诊断胃肠道肿瘤和预测非小细胞肺癌免疫治疗反应方面的能力。HGMT为研究肠道微生物群在肿瘤和免疫治疗反应中的作用提供了宝贵的资源。

引用次数: 0

scKGBERT: a knowledge-enhanced foundation model for single-cell transcriptomics scKGBERT：单细胞转录组学的知识增强基础模型

IF 12.3 1区生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY

Genome Biology

Pub Date : 2025-11-24 DOI: 10.1186/s13059-025-03862-6

Yang Li, Guanyu Qiao, Hongli Du, Xin Gao, Guohua Wang

Single-cell transcriptomics enables precise characterization of cellular heterogeneity, but current pre-trained models relying solely on expression data fail to capture gene associations. We present scKGBERT, a knowledge-enhanced foundation model integrating 41 M single-cell RNA-seq profiles and 8.9 M protein–protein interactions to jointly learn gene and cell representations. scKGBERT employs Gaussian attention to emphasize key genes and improve biomarker identification, achieving superior performance across gene annotation, drug response, and disease prediction tasks. scKGBERT enhances biological interpretability and offers a powerful resource for precision medicine and disease mechanism discovery.

单细胞转录组学能够精确表征细胞异质性，但目前仅依靠表达数据的预训练模型无法捕获基因关联。我们提出了scKGBERT，这是一个知识增强的基础模型，集成了41 M单细胞RNA-seq图谱和8.9 M蛋白质-蛋白质相互作用，以共同学习基因和细胞表征。scKGBERT采用高斯注意力来强调关键基因，提高生物标志物的识别，在基因注释、药物反应和疾病预测任务中取得了卓越的表现。scKGBERT提高了生物学的可解释性，为精准医学和疾病机制的发现提供了强大的资源。

引用次数: 0

A reinforcement learning-based approach for dynamic privacy protection in genomic data sharing beacons 基因组数据共享信标中基于强化学习的动态隐私保护方法

IF 12.3 1区生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY

Genome Biology

Pub Date : 2025-11-24 DOI: 10.1186/s13059-025-03871-5

Masoud Poorghaffar Aghdam, Sobhan Shukueian Tabrizi, Kerem Ayöz, Erman Ayday, Sinem Sav, A. Ercüment Çiçek

The rise of genomic sequencing raises privacy concerns due to the identifiable nature of genomic data. The GA4GH Beacon Project enables privacy-preserving data sharing but is vulnerable to membership inference attacks that reveal individual participation. Existing defenses, such as noise addition and query restrictions, rely on static policies that attackers can bypass. We introduce the first reinforcement learning (RL)-based dynamic defense for the beacon protocol, training defender and attacker agents in a multiplayer setting. Our approach adapts responses in real time, distinguishing users from adversaries and balancing privacy with utility against evolving threats.

由于基因组数据的可识别性，基因组测序的兴起引起了对隐私的担忧。GA4GH信标项目支持保护隐私的数据共享，但容易受到暴露个人参与的成员推理攻击。现有的防御，如噪音添加和查询限制，依赖于攻击者可以绕过的静态策略。我们为信标协议引入了第一个基于强化学习（RL）的动态防御，在多人环境中训练防御者和攻击者代理。我们的方法适应实时响应，区分用户和对手，并平衡隐私和效用，以应对不断变化的威胁。

引用次数: 0

Long-read structural variant discovery and targeted short read genotyping enables population scale characterization of structural variation in rhesus macaques 长读结构变异的发现和靶向短读基因分型使恒河猴结构变异的种群尺度表征成为可能

IF 12.3 1区生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY

Genome Biology

Pub Date : 2025-11-21 DOI: 10.1186/s13059-025-03873-3

Karina Ray, Christina Mulch, Samuel M. Peterson, Sebastian Benjamin, Nathan Gullicksrud, Adam J. Ericsen, Eric J. Vallender, Betsy M. Ferguson, Jeffrey D. Wall, Benjamin N. Bimber

Due to their close evolutionary relationship with humans, rhesus macaques are an important pre-clinical model. While genetic diversity driven by short nucleotide variation has long been studied in rhesus macaques, there is comparatively little known about structural variation, with most published studies focused on cross-species comparative analyses. Understanding the degree and implications of intraspecies structural variation is essential to all biomedical research using rhesus macaques as a model. Here we present long-read sequencing of 59 rhesus macaques, identifying a catalog of 339,334 structural variants (SVs), which we subsequently genotype in a cohort of 2,645 individuals with short read whole genome sequencing data to create the largest public dataset of rhesus macaque SVs. These data reveal population structure within rhesus macaque SVs based on both geographic ancestry and to a lesser degree, breeding center. While there is evidence of strong purifying selection against SVs within exons, 0.7% of SVs overlap exons, with an average of 16.9 rare SVs per subject predicted to have a high impact on protein coding sequences. Notably, rhesus macaque SVs are dominated by Alu retrotransposition events, which comprise 55.7% of SVs and suggest significantly different modes of SV formation relative to humans and great apes. This dataset represents the largest study of structural variation in rhesus macaques to date and demonstrates use of both long and short-read datasets to generate SV genotype data. These data enable the consideration of structural variation impact in rhesus macaque-based research and will also aid the development of primate pangenomes.

由于它们与人类的密切进化关系，恒河猴是重要的临床前模型。长期以来，人们一直在恒河猴中研究由短核苷酸变异驱动的遗传多样性，但对结构变异的研究相对较少，大多数已发表的研究都集中在跨物种比较分析上。了解种内结构变异的程度和影响对所有以恒河猴为模型的生物医学研究至关重要。在这里，我们展示了59只恒河猴的长读测序，确定了339,334个结构变异（SVs）的目录，随后我们在2645个个体的短读全基因组测序数据中进行基因分型，以创建最大的恒河猴SVs公共数据集。这些数据揭示了恒河猴SVs的种群结构既基于地理祖先，也在较小程度上基于繁殖中心。虽然有证据表明外显子内存在对SVs的强烈纯化选择，但0.7%的SVs重叠外显子，平均每个受试者有16.9个罕见的SVs，预计对蛋白质编码序列有很大影响。值得注意的是，恒河猴的SV以Alu逆转录事件为主，占SV的55.7%，表明与人类和类人猿相比，SV的形成模式明显不同。该数据集代表了迄今为止对恒河猴结构变异的最大研究，并展示了使用长读和短读数据集来生成SV基因型数据。这些数据使考虑结构变异对恒河猴研究的影响，也将有助于灵长类动物泛基因组的发展。

{"title":"Long-read structural variant discovery and targeted short read genotyping enables population scale characterization of structural variation in rhesus macaques","authors":"Karina Ray, Christina Mulch, Samuel M. Peterson, Sebastian Benjamin, Nathan Gullicksrud, Adam J. Ericsen, Eric J. Vallender, Betsy M. Ferguson, Jeffrey D. Wall, Benjamin N. Bimber","doi":"10.1186/s13059-025-03873-3","DOIUrl":"https://doi.org/10.1186/s13059-025-03873-3","url":null,"abstract":"Due to their close evolutionary relationship with humans, rhesus macaques are an important pre-clinical model. While genetic diversity driven by short nucleotide variation has long been studied in rhesus macaques, there is comparatively little known about structural variation, with most published studies focused on cross-species comparative analyses. Understanding the degree and implications of intraspecies structural variation is essential to all biomedical research using rhesus macaques as a model. Here we present long-read sequencing of 59 rhesus macaques, identifying a catalog of 339,334 structural variants (SVs), which we subsequently genotype in a cohort of 2,645 individuals with short read whole genome sequencing data to create the largest public dataset of rhesus macaque SVs. These data reveal population structure within rhesus macaque SVs based on both geographic ancestry and to a lesser degree, breeding center. While there is evidence of strong purifying selection against SVs within exons, 0.7% of SVs overlap exons, with an average of 16.9 rare SVs per subject predicted to have a high impact on protein coding sequences. Notably, rhesus macaque SVs are dominated by Alu retrotransposition events, which comprise 55.7% of SVs and suggest significantly different modes of SV formation relative to humans and great apes. This dataset represents the largest study of structural variation in rhesus macaques to date and demonstrates use of both long and short-read datasets to generate SV genotype data. These data enable the consideration of structural variation impact in rhesus macaque-based research and will also aid the development of primate pangenomes.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"11 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145559410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Benchmarking deep learning methods for biologically conserved single-cell integration 生物保守的单细胞整合的深度学习基准方法

IF 12.3 1区生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY

Genome Biology

Pub Date : 2025-11-20 DOI: 10.1186/s13059-025-03869-z

Chenxin Yi, Jinyu Cheng, Jiajun Chen, Wanquan Liu, Junwei Liu, Yixue Li

Advancements in single-cell RNA sequencing have enabled the analysis of millions of cells, but integrating such data across samples and methods while mitigating batch effects remains challenging. Deep learning approaches address this by learning biologically conserved gene expression representations, yet systematic benchmarking of loss functions and integration performance is lacking. We evaluate 16 integration methods using a unified variational autoencoder framework, incorporating batch and cell-type information. Results reveal limitations in the single-cell integration benchmarking index (scIB) for preserving intra-cell-type information. To address this, we introduce a correlation-based loss function and enhance benchmarking metrics to better capture biological conservation. Using cell annotations from lung and breast atlases, our approach improves biological signal preservation. We propose a refined integration framework, scIB-E, and metrics that provide deeper insights into the integration process and offer guidance for advanced developments in integrating increasingly complex single-cell data. This benchmark highlights the potential of deep learning-based approaches for single-cell data integration, emphasizing the importance of biologically informed metrics and improved benchmarking strategies.

单细胞RNA测序技术的进步使数百万个细胞的分析成为可能，但在减轻批量效应的同时，跨样本和方法整合这些数据仍然具有挑战性。深度学习方法通过学习生物学上保守的基因表达表征来解决这个问题，但缺乏损失函数和集成性能的系统基准测试。我们使用统一的变分自编码器框架评估了16种集成方法，包括批处理和单元类型信息。结果显示单细胞整合基准指数（scIB）保存细胞内类型信息的局限性。为了解决这个问题，我们引入了一个基于相关的损失函数，并增强了基准指标，以更好地捕捉生物保护。利用来自肺和乳腺图谱的细胞注释，我们的方法提高了生物信号的保存。我们提出了一个精细化的集成框架、scIB-E和指标，为集成过程提供了更深入的见解，并为集成日益复杂的单细胞数据的高级开发提供了指导。该基准强调了基于深度学习的单细胞数据集成方法的潜力，强调了生物学知情指标和改进基准策略的重要性。

{"title":"Benchmarking deep learning methods for biologically conserved single-cell integration","authors":"Chenxin Yi, Jinyu Cheng, Jiajun Chen, Wanquan Liu, Junwei Liu, Yixue Li","doi":"10.1186/s13059-025-03869-z","DOIUrl":"https://doi.org/10.1186/s13059-025-03869-z","url":null,"abstract":"Advancements in single-cell RNA sequencing have enabled the analysis of millions of cells, but integrating such data across samples and methods while mitigating batch effects remains challenging. Deep learning approaches address this by learning biologically conserved gene expression representations, yet systematic benchmarking of loss functions and integration performance is lacking. We evaluate 16 integration methods using a unified variational autoencoder framework, incorporating batch and cell-type information. Results reveal limitations in the single-cell integration benchmarking index (scIB) for preserving intra-cell-type information. To address this, we introduce a correlation-based loss function and enhance benchmarking metrics to better capture biological conservation. Using cell annotations from lung and breast atlases, our approach improves biological signal preservation. We propose a refined integration framework, scIB-E, and metrics that provide deeper insights into the integration process and offer guidance for advanced developments in integrating increasingly complex single-cell data. This benchmark highlights the potential of deep learning-based approaches for single-cell data integration, emphasizing the importance of biologically informed metrics and improved benchmarking strategies.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"177 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145554403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Genome Biology

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀