首页 > 最新文献

Genome Biology最新文献

英文 中文
HAlign-G: rapid and low-memory multiple-genome aligner for large-scale closely related genomes. HAlign-G:快速和低记忆的多基因组比对器。
IF 10.1 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-28 DOI: 10.1186/s13059-025-03881-3
Pinglu Zhang, Tong Zhou, Yanming Wei, Qinzhong Tian, Yixiao Zhai, Yizheng Wang, Quan Zou, Furong Tang, Ximei Luo
{"title":"HAlign-G: rapid and low-memory multiple-genome aligner for large-scale closely related genomes.","authors":"Pinglu Zhang, Tong Zhou, Yanming Wei, Qinzhong Tian, Yixiao Zhai, Yizheng Wang, Quan Zou, Furong Tang, Ximei Luo","doi":"10.1186/s13059-025-03881-3","DOIUrl":"10.1186/s13059-025-03881-3","url":null,"abstract":"","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"26 1","pages":"406"},"PeriodicalIF":10.1,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12664224/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145632164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Molecular effects of transposable element sequences in mammalian cells. 转座因子序列在哺乳动物细胞中的分子效应。
IF 10.1 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-26 DOI: 10.1186/s13059-025-03883-1
Ming-Ching C Wen, Joshua D Welch

Transposable elements (TEs) are often epigenetically repressed in eukaryotic cells, but still affect the molecular state of the cell in certain contexts. A flurry of recent studies have elucidated new effects of TE sequences in eukaryotic cells. We review these emerging molecular effects of TEs, including a variety of new mechanisms by which TE sequences affect the cell, including pre- and post-transcriptional regulation of gene expression; cell-to-cell transmission of genes within a multicellular organism through virus-like activity; and RNA-guided DNA insertion. Recent demonstration of TE-guided genome editing underscores the importance of these investigations for both basic and translational research. Future work is needed to continue to unravel the molecular effects of TE sequences across developmental stages, across cell types, and in various diseases.

转座因子(TEs)在真核细胞中经常被表观遗传抑制,但在某些情况下仍然影响细胞的分子状态。最近的一系列研究已经阐明了TE序列在真核细胞中的新作用。我们回顾了这些新兴的TE分子效应,包括TE序列影响细胞的各种新机制,包括基因表达的转录前和转录后调控;通过病毒样活动在多细胞生物体内进行基因的细胞间传播;以及rna引导的DNA插入。最近te引导的基因组编辑的演示强调了这些研究对基础研究和转化研究的重要性。未来的工作需要继续揭示TE序列在不同发育阶段、不同细胞类型和不同疾病中的分子效应。
{"title":"Molecular effects of transposable element sequences in mammalian cells.","authors":"Ming-Ching C Wen, Joshua D Welch","doi":"10.1186/s13059-025-03883-1","DOIUrl":"https://doi.org/10.1186/s13059-025-03883-1","url":null,"abstract":"<p><p>Transposable elements (TEs) are often epigenetically repressed in eukaryotic cells, but still affect the molecular state of the cell in certain contexts. A flurry of recent studies have elucidated new effects of TE sequences in eukaryotic cells. We review these emerging molecular effects of TEs, including a variety of new mechanisms by which TE sequences affect the cell, including pre- and post-transcriptional regulation of gene expression; cell-to-cell transmission of genes within a multicellular organism through virus-like activity; and RNA-guided DNA insertion. Recent demonstration of TE-guided genome editing underscores the importance of these investigations for both basic and translational research. Future work is needed to continue to unravel the molecular effects of TE sequences across developmental stages, across cell types, and in various diseases.</p>","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"26 1","pages":"403"},"PeriodicalIF":10.1,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12649098/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145632145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Loss of IDH1 and IDH2 mutations during the evolution of metastatic chondrosarcoma. 在转移性软骨肉瘤的演变过程中IDH1和IDH2突变的丢失。
IF 10.1 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-26 DOI: 10.1186/s13059-025-03812-2
William Cross, Iben Lyskjær, Christopher Davies, Abigail Bunkum, Ana Maia Rocha, Tom Lesluyes, Fernanda Amary, Roberto Tirabosco, Cristina Naceur-Lombardelli, Mariam Jamal-Hanjani, Charles Swanton, Nischalan Pillay, Simone Zaccaria, Adrienne M Flanagan, Peter Van Loo

Driver mutations in IDH1 and IDH2 are initiating events in the evolution of chondrosarcoma and several other cancer types. Here, we present evidence that mutant IDH1 is recurrently lost in metastatic central chondrosarcoma. This may reflect either relaxed positive selection for the mutant IDH1 locus, or negative selection for the hypermethylation phenotype later in tumor evolution. This finding highlights the challenge for therapeutic intervention by mutant IDH1 inhibitors in chondrosarcoma.

IDH1和IDH2的驱动突变是软骨肉瘤和其他几种癌症类型进化的起始事件。在这里,我们提出的证据表明,突变体IDH1在转移性中央软骨肉瘤中反复丢失。这可能反映了突变体IDH1位点的宽松阳性选择,或肿瘤进化后期超甲基化表型的负选择。这一发现强调了突变型IDH1抑制剂对软骨肉瘤治疗干预的挑战。
{"title":"Loss of IDH1 and IDH2 mutations during the evolution of metastatic chondrosarcoma.","authors":"William Cross, Iben Lyskjær, Christopher Davies, Abigail Bunkum, Ana Maia Rocha, Tom Lesluyes, Fernanda Amary, Roberto Tirabosco, Cristina Naceur-Lombardelli, Mariam Jamal-Hanjani, Charles Swanton, Nischalan Pillay, Simone Zaccaria, Adrienne M Flanagan, Peter Van Loo","doi":"10.1186/s13059-025-03812-2","DOIUrl":"https://doi.org/10.1186/s13059-025-03812-2","url":null,"abstract":"<p><p>Driver mutations in IDH1 and IDH2 are initiating events in the evolution of chondrosarcoma and several other cancer types. Here, we present evidence that mutant IDH1 is recurrently lost in metastatic central chondrosarcoma. This may reflect either relaxed positive selection for the mutant IDH1 locus, or negative selection for the hypermethylation phenotype later in tumor evolution. This finding highlights the challenge for therapeutic intervention by mutant IDH1 inhibitors in chondrosarcoma.</p>","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"26 1","pages":"404"},"PeriodicalIF":10.1,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12659308/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145632119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mycobacterium tuberculosis uses intrinsically disordered, fast evolving proteins to interact with conserved host factors 结核分枝杆菌利用内在无序、快速进化的蛋白质与保守的宿主因子相互作用
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-24 DOI: 10.1186/s13059-025-03854-6
Uberto Pozzoli, Diego Forni, Federica Arrigoni, Rachele Cagliani, Luca De Gioia, Manuela Sironi
Intrinsically disordered protein regions (IDRs) are implicated in diverse cellular processes in eukaryotes and, in these organisms, they cover up to 40% of the proteome. Surprisingly little is known about IDRs in bacterial proteomes. Specifically, a number of questions remain unanswered, such as the role of these regions in host–pathogen interactions, their adaptive potential and evolutionary trajectories, as well as their biophysical properties. Here we focus on Mycobacterium tuberculosis and take advantage of the fact that, due to its extreme epidemiological relevance, several large-scale analyses are available. After benchmarking different disorder prediction tools, we integrate multiple levels of biological information to show that IDR-containing proteins are involved in virulence, in the modulation of host immune response, and in lipid metabolism. Mycobacterium tuberculosis IDRs are fast evolving and poorly antigenic, and they display specific sequence-ensemble-function relationships. Conversely, human proteins that interact with Mycobacterium tuberculosis are evolutionary constrained, widely expressed, and highly connected in the human interactome map. This indicates that the classical arms race paradigm is not universal in host–pathogen interactions. We also extend analysis to 540 human-infecting bacteria and we underscore wide variations in IDR representation and conformational properties. Our data point to a role of IDRs in contributing to bacterial virulence, interaction with the human host, and control of immune responses. Although this awaits experimental validation, we suggest that Mycobacterium tuberculosis also uses IDRs to sense and interact with its environment. Herein, we provide a database of bacterial IDRs, together with relevant parameters, for public use.
内在无序蛋白区(IDRs)与真核生物的多种细胞过程有关,在这些生物中,它们覆盖了高达40%的蛋白质组。令人惊讶的是,我们对细菌蛋白质组中的idr知之甚少。具体来说,许多问题仍未得到解答,例如这些区域在宿主-病原体相互作用中的作用,它们的适应潜力和进化轨迹,以及它们的生物物理特性。在这里,我们将重点放在结核分枝杆菌上,并利用这一事实,即由于其极端的流行病学相关性,可以进行几次大规模分析。在对不同的疾病预测工具进行基准测试后,我们整合了多个水平的生物信息,以表明含有idr的蛋白质参与了毒力、宿主免疫反应的调节和脂质代谢。结核分枝杆菌idr进化快,抗原性差,它们表现出特定的序列-集合-功能关系。相反,与结核分枝杆菌相互作用的人类蛋白在进化上受到限制,广泛表达,并在人类相互作用组图谱中高度关联。这表明经典的军备竞赛范式在宿主-病原体相互作用中并不普遍。我们还将分析扩展到540种人类感染细菌,并强调了IDR表示和构象特性的广泛差异。我们的数据表明IDRs在促进细菌毒力、与人类宿主相互作用和控制免疫反应方面的作用。虽然这有待实验验证,但我们认为结核分枝杆菌也使用idr来感知其环境并与之相互作用。在此,我们提供了一个细菌idr数据库,连同相关参数,供公众使用。
{"title":"Mycobacterium tuberculosis uses intrinsically disordered, fast evolving proteins to interact with conserved host factors","authors":"Uberto Pozzoli, Diego Forni, Federica Arrigoni, Rachele Cagliani, Luca De Gioia, Manuela Sironi","doi":"10.1186/s13059-025-03854-6","DOIUrl":"https://doi.org/10.1186/s13059-025-03854-6","url":null,"abstract":"Intrinsically disordered protein regions (IDRs) are implicated in diverse cellular processes in eukaryotes and, in these organisms, they cover up to 40% of the proteome. Surprisingly little is known about IDRs in bacterial proteomes. Specifically, a number of questions remain unanswered, such as the role of these regions in host–pathogen interactions, their adaptive potential and evolutionary trajectories, as well as their biophysical properties. Here we focus on Mycobacterium tuberculosis and take advantage of the fact that, due to its extreme epidemiological relevance, several large-scale analyses are available. After benchmarking different disorder prediction tools, we integrate multiple levels of biological information to show that IDR-containing proteins are involved in virulence, in the modulation of host immune response, and in lipid metabolism. Mycobacterium tuberculosis IDRs are fast evolving and poorly antigenic, and they display specific sequence-ensemble-function relationships. Conversely, human proteins that interact with Mycobacterium tuberculosis are evolutionary constrained, widely expressed, and highly connected in the human interactome map. This indicates that the classical arms race paradigm is not universal in host–pathogen interactions. We also extend analysis to 540 human-infecting bacteria and we underscore wide variations in IDR representation and conformational properties. Our data point to a role of IDRs in contributing to bacterial virulence, interaction with the human host, and control of immune responses. Although this awaits experimental validation, we suggest that Mycobacterium tuberculosis also uses IDRs to sense and interact with its environment. Herein, we provide a database of bacterial IDRs, together with relevant parameters, for public use.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"112 1","pages":"387"},"PeriodicalIF":12.3,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145583715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HGMT: a database of human gut microbiota for tumors and immunotherapy response HGMT:肿瘤和免疫治疗反应的人类肠道微生物群数据库
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-24 DOI: 10.1186/s13059-025-03865-3
Jinxin Liu, Mingyu Wang, Chentao Xu, Longhao Jia, Senying Lai, Zi-Chao Zhang, Jinglong Zhang, Wei-Hua Chen, Yucheng T. Yang, Xing-Ming Zhao
HGMT is a database designed to analyze, explore, and visualize gut microbiomes from diverse tumor types. We process metagenomic datasets from 18,630 stool samples across 37 tumor types, including 2,207 samples from immunotherapy-treated patients across 12 tumor types. HGMT provides an interactive portal for querying taxonomic and functional profiles, visualizing cross-dataset differential abundance taxa in tumors, and identifying their pan-tumor associations. Our analysis reveals the capability of gut microbiota in diagnosing gastrointestinal tumors and predicting immunotherapy response for non-small cell lung carcinoma. HGMT represents a valuable resource for investigating the roles of gut microbiota in tumors and immunotherapy response.
HGMT是一个旨在分析、探索和可视化不同肿瘤类型肠道微生物组的数据库。我们处理了来自37种肿瘤类型的18,630份粪便样本的宏基因组数据集,其中包括来自12种肿瘤类型的免疫治疗患者的2,207份样本。HGMT提供了一个交互式门户,用于查询分类和功能概况,可视化肿瘤中跨数据集差异丰度分类群,并确定其泛肿瘤关联。我们的分析揭示了肠道微生物群在诊断胃肠道肿瘤和预测非小细胞肺癌免疫治疗反应方面的能力。HGMT为研究肠道微生物群在肿瘤和免疫治疗反应中的作用提供了宝贵的资源。
{"title":"HGMT: a database of human gut microbiota for tumors and immunotherapy response","authors":"Jinxin Liu, Mingyu Wang, Chentao Xu, Longhao Jia, Senying Lai, Zi-Chao Zhang, Jinglong Zhang, Wei-Hua Chen, Yucheng T. Yang, Xing-Ming Zhao","doi":"10.1186/s13059-025-03865-3","DOIUrl":"https://doi.org/10.1186/s13059-025-03865-3","url":null,"abstract":"HGMT is a database designed to analyze, explore, and visualize gut microbiomes from diverse tumor types. We process metagenomic datasets from 18,630 stool samples across 37 tumor types, including 2,207 samples from immunotherapy-treated patients across 12 tumor types. HGMT provides an interactive portal for querying taxonomic and functional profiles, visualizing cross-dataset differential abundance taxa in tumors, and identifying their pan-tumor associations. Our analysis reveals the capability of gut microbiota in diagnosing gastrointestinal tumors and predicting immunotherapy response for non-small cell lung carcinoma. HGMT represents a valuable resource for investigating the roles of gut microbiota in tumors and immunotherapy response.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"223 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145583713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scKGBERT: a knowledge-enhanced foundation model for single-cell transcriptomics scKGBERT:单细胞转录组学的知识增强基础模型
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-24 DOI: 10.1186/s13059-025-03862-6
Yang Li, Guanyu Qiao, Hongli Du, Xin Gao, Guohua Wang
Single-cell transcriptomics enables precise characterization of cellular heterogeneity, but current pre-trained models relying solely on expression data fail to capture gene associations. We present scKGBERT, a knowledge-enhanced foundation model integrating 41 M single-cell RNA-seq profiles and 8.9 M protein–protein interactions to jointly learn gene and cell representations. scKGBERT employs Gaussian attention to emphasize key genes and improve biomarker identification, achieving superior performance across gene annotation, drug response, and disease prediction tasks. scKGBERT enhances biological interpretability and offers a powerful resource for precision medicine and disease mechanism discovery.
单细胞转录组学能够精确表征细胞异质性,但目前仅依靠表达数据的预训练模型无法捕获基因关联。我们提出了scKGBERT,这是一个知识增强的基础模型,集成了41 M单细胞RNA-seq图谱和8.9 M蛋白质-蛋白质相互作用,以共同学习基因和细胞表征。scKGBERT采用高斯注意力来强调关键基因,提高生物标志物的识别,在基因注释、药物反应和疾病预测任务中取得了卓越的表现。scKGBERT提高了生物学的可解释性,为精准医学和疾病机制的发现提供了强大的资源。
{"title":"scKGBERT: a knowledge-enhanced foundation model for single-cell transcriptomics","authors":"Yang Li, Guanyu Qiao, Hongli Du, Xin Gao, Guohua Wang","doi":"10.1186/s13059-025-03862-6","DOIUrl":"https://doi.org/10.1186/s13059-025-03862-6","url":null,"abstract":"Single-cell transcriptomics enables precise characterization of cellular heterogeneity, but current pre-trained models relying solely on expression data fail to capture gene associations. We present scKGBERT, a knowledge-enhanced foundation model integrating 41 M single-cell RNA-seq profiles and 8.9 M protein–protein interactions to jointly learn gene and cell representations. scKGBERT employs Gaussian attention to emphasize key genes and improve biomarker identification, achieving superior performance across gene annotation, drug response, and disease prediction tasks. scKGBERT enhances biological interpretability and offers a powerful resource for precision medicine and disease mechanism discovery.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"100 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145583712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A reinforcement learning-based approach for dynamic privacy protection in genomic data sharing beacons 基因组数据共享信标中基于强化学习的动态隐私保护方法
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-24 DOI: 10.1186/s13059-025-03871-5
Masoud Poorghaffar Aghdam, Sobhan Shukueian Tabrizi, Kerem Ayöz, Erman Ayday, Sinem Sav, A. Ercüment Çiçek
The rise of genomic sequencing raises privacy concerns due to the identifiable nature of genomic data. The GA4GH Beacon Project enables privacy-preserving data sharing but is vulnerable to membership inference attacks that reveal individual participation. Existing defenses, such as noise addition and query restrictions, rely on static policies that attackers can bypass. We introduce the first reinforcement learning (RL)-based dynamic defense for the beacon protocol, training defender and attacker agents in a multiplayer setting. Our approach adapts responses in real time, distinguishing users from adversaries and balancing privacy with utility against evolving threats.
由于基因组数据的可识别性,基因组测序的兴起引起了对隐私的担忧。GA4GH信标项目支持保护隐私的数据共享,但容易受到暴露个人参与的成员推理攻击。现有的防御,如噪音添加和查询限制,依赖于攻击者可以绕过的静态策略。我们为信标协议引入了第一个基于强化学习(RL)的动态防御,在多人环境中训练防御者和攻击者代理。我们的方法适应实时响应,区分用户和对手,并平衡隐私和效用,以应对不断变化的威胁。
{"title":"A reinforcement learning-based approach for dynamic privacy protection in genomic data sharing beacons","authors":"Masoud Poorghaffar Aghdam, Sobhan Shukueian Tabrizi, Kerem Ayöz, Erman Ayday, Sinem Sav, A. Ercüment Çiçek","doi":"10.1186/s13059-025-03871-5","DOIUrl":"https://doi.org/10.1186/s13059-025-03871-5","url":null,"abstract":"The rise of genomic sequencing raises privacy concerns due to the identifiable nature of genomic data. The GA4GH Beacon Project enables privacy-preserving data sharing but is vulnerable to membership inference attacks that reveal individual participation. Existing defenses, such as noise addition and query restrictions, rely on static policies that attackers can bypass. We introduce the first reinforcement learning (RL)-based dynamic defense for the beacon protocol, training defender and attacker agents in a multiplayer setting. Our approach adapts responses in real time, distinguishing users from adversaries and balancing privacy with utility against evolving threats.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"143 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145583714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Long-read structural variant discovery and targeted short read genotyping enables population scale characterization of structural variation in rhesus macaques 长读结构变异的发现和靶向短读基因分型使恒河猴结构变异的种群尺度表征成为可能
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-21 DOI: 10.1186/s13059-025-03873-3
Karina Ray, Christina Mulch, Samuel M. Peterson, Sebastian Benjamin, Nathan Gullicksrud, Adam J. Ericsen, Eric J. Vallender, Betsy M. Ferguson, Jeffrey D. Wall, Benjamin N. Bimber
Due to their close evolutionary relationship with humans, rhesus macaques are an important pre-clinical model. While genetic diversity driven by short nucleotide variation has long been studied in rhesus macaques, there is comparatively little known about structural variation, with most published studies focused on cross-species comparative analyses. Understanding the degree and implications of intraspecies structural variation is essential to all biomedical research using rhesus macaques as a model. Here we present long-read sequencing of 59 rhesus macaques, identifying a catalog of 339,334 structural variants (SVs), which we subsequently genotype in a cohort of 2,645 individuals with short read whole genome sequencing data to create the largest public dataset of rhesus macaque SVs. These data reveal population structure within rhesus macaque SVs based on both geographic ancestry and to a lesser degree, breeding center. While there is evidence of strong purifying selection against SVs within exons, 0.7% of SVs overlap exons, with an average of 16.9 rare SVs per subject predicted to have a high impact on protein coding sequences. Notably, rhesus macaque SVs are dominated by Alu retrotransposition events, which comprise 55.7% of SVs and suggest significantly different modes of SV formation relative to humans and great apes. This dataset represents the largest study of structural variation in rhesus macaques to date and demonstrates use of both long and short-read datasets to generate SV genotype data. These data enable the consideration of structural variation impact in rhesus macaque-based research and will also aid the development of primate pangenomes.
由于它们与人类的密切进化关系,恒河猴是重要的临床前模型。长期以来,人们一直在恒河猴中研究由短核苷酸变异驱动的遗传多样性,但对结构变异的研究相对较少,大多数已发表的研究都集中在跨物种比较分析上。了解种内结构变异的程度和影响对所有以恒河猴为模型的生物医学研究至关重要。在这里,我们展示了59只恒河猴的长读测序,确定了339,334个结构变异(SVs)的目录,随后我们在2645个个体的短读全基因组测序数据中进行基因分型,以创建最大的恒河猴SVs公共数据集。这些数据揭示了恒河猴SVs的种群结构既基于地理祖先,也在较小程度上基于繁殖中心。虽然有证据表明外显子内存在对SVs的强烈纯化选择,但0.7%的SVs重叠外显子,平均每个受试者有16.9个罕见的SVs,预计对蛋白质编码序列有很大影响。值得注意的是,恒河猴的SV以Alu逆转录事件为主,占SV的55.7%,表明与人类和类人猿相比,SV的形成模式明显不同。该数据集代表了迄今为止对恒河猴结构变异的最大研究,并展示了使用长读和短读数据集来生成SV基因型数据。这些数据使考虑结构变异对恒河猴研究的影响,也将有助于灵长类动物泛基因组的发展。
{"title":"Long-read structural variant discovery and targeted short read genotyping enables population scale characterization of structural variation in rhesus macaques","authors":"Karina Ray, Christina Mulch, Samuel M. Peterson, Sebastian Benjamin, Nathan Gullicksrud, Adam J. Ericsen, Eric J. Vallender, Betsy M. Ferguson, Jeffrey D. Wall, Benjamin N. Bimber","doi":"10.1186/s13059-025-03873-3","DOIUrl":"https://doi.org/10.1186/s13059-025-03873-3","url":null,"abstract":"Due to their close evolutionary relationship with humans, rhesus macaques are an important pre-clinical model. While genetic diversity driven by short nucleotide variation has long been studied in rhesus macaques, there is comparatively little known about structural variation, with most published studies focused on cross-species comparative analyses. Understanding the degree and implications of intraspecies structural variation is essential to all biomedical research using rhesus macaques as a model. Here we present long-read sequencing of 59 rhesus macaques, identifying a catalog of 339,334 structural variants (SVs), which we subsequently genotype in a cohort of 2,645 individuals with short read whole genome sequencing data to create the largest public dataset of rhesus macaque SVs. These data reveal population structure within rhesus macaque SVs based on both geographic ancestry and to a lesser degree, breeding center. While there is evidence of strong purifying selection against SVs within exons, 0.7% of SVs overlap exons, with an average of 16.9 rare SVs per subject predicted to have a high impact on protein coding sequences. Notably, rhesus macaque SVs are dominated by Alu retrotransposition events, which comprise 55.7% of SVs and suggest significantly different modes of SV formation relative to humans and great apes. This dataset represents the largest study of structural variation in rhesus macaques to date and demonstrates use of both long and short-read datasets to generate SV genotype data. These data enable the consideration of structural variation impact in rhesus macaque-based research and will also aid the development of primate pangenomes.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"11 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145559410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benchmarking deep learning methods for biologically conserved single-cell integration 生物保守的单细胞整合的深度学习基准方法
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-20 DOI: 10.1186/s13059-025-03869-z
Chenxin Yi, Jinyu Cheng, Jiajun Chen, Wanquan Liu, Junwei Liu, Yixue Li
Advancements in single-cell RNA sequencing have enabled the analysis of millions of cells, but integrating such data across samples and methods while mitigating batch effects remains challenging. Deep learning approaches address this by learning biologically conserved gene expression representations, yet systematic benchmarking of loss functions and integration performance is lacking. We evaluate 16 integration methods using a unified variational autoencoder framework, incorporating batch and cell-type information. Results reveal limitations in the single-cell integration benchmarking index (scIB) for preserving intra-cell-type information. To address this, we introduce a correlation-based loss function and enhance benchmarking metrics to better capture biological conservation. Using cell annotations from lung and breast atlases, our approach improves biological signal preservation. We propose a refined integration framework, scIB-E, and metrics that provide deeper insights into the integration process and offer guidance for advanced developments in integrating increasingly complex single-cell data. This benchmark highlights the potential of deep learning-based approaches for single-cell data integration, emphasizing the importance of biologically informed metrics and improved benchmarking strategies.
单细胞RNA测序技术的进步使数百万个细胞的分析成为可能,但在减轻批量效应的同时,跨样本和方法整合这些数据仍然具有挑战性。深度学习方法通过学习生物学上保守的基因表达表征来解决这个问题,但缺乏损失函数和集成性能的系统基准测试。我们使用统一的变分自编码器框架评估了16种集成方法,包括批处理和单元类型信息。结果显示单细胞整合基准指数(scIB)保存细胞内类型信息的局限性。为了解决这个问题,我们引入了一个基于相关的损失函数,并增强了基准指标,以更好地捕捉生物保护。利用来自肺和乳腺图谱的细胞注释,我们的方法提高了生物信号的保存。我们提出了一个精细化的集成框架、scIB-E和指标,为集成过程提供了更深入的见解,并为集成日益复杂的单细胞数据的高级开发提供了指导。该基准强调了基于深度学习的单细胞数据集成方法的潜力,强调了生物学知情指标和改进基准策略的重要性。
{"title":"Benchmarking deep learning methods for biologically conserved single-cell integration","authors":"Chenxin Yi, Jinyu Cheng, Jiajun Chen, Wanquan Liu, Junwei Liu, Yixue Li","doi":"10.1186/s13059-025-03869-z","DOIUrl":"https://doi.org/10.1186/s13059-025-03869-z","url":null,"abstract":"Advancements in single-cell RNA sequencing have enabled the analysis of millions of cells, but integrating such data across samples and methods while mitigating batch effects remains challenging. Deep learning approaches address this by learning biologically conserved gene expression representations, yet systematic benchmarking of loss functions and integration performance is lacking. We evaluate 16 integration methods using a unified variational autoencoder framework, incorporating batch and cell-type information. Results reveal limitations in the single-cell integration benchmarking index (scIB) for preserving intra-cell-type information. To address this, we introduce a correlation-based loss function and enhance benchmarking metrics to better capture biological conservation. Using cell annotations from lung and breast atlases, our approach improves biological signal preservation. We propose a refined integration framework, scIB-E, and metrics that provide deeper insights into the integration process and offer guidance for advanced developments in integrating increasingly complex single-cell data. This benchmark highlights the potential of deep learning-based approaches for single-cell data integration, emphasizing the importance of biologically informed metrics and improved benchmarking strategies.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"177 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145554403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structure-enhanced graph meta learning for few-shot gene regulatory network inference 基于结构增强图元学习的小片段基因调控网络推理
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-11-20 DOI: 10.1186/s13059-025-03860-8
Weiming Yu, Zhuobin Chen, Yaohua Hu, Jing Qin, Le Ou-Yang
Inferring gene regulatory networks (GRNs) is essential for understanding biological regulation. Although numerous deep learning approaches have been developed for GRN inference, most require large amounts of labeled data. We present Meta-TGLink, a structure-enhanced graph meta-learning model for few-shot GRN inference. By formulating GRN inference as a link prediction task, Meta-TGLink captures transferable regulatory patterns while reducing dependence on extensive labeled datasets. The model combines graph neural networks with Transformer architectures to integrate relational and positional information, thereby improving predictive performance under data-scarce conditions. Experiments on real datasets demonstrate its superiority over state-of-the-art baselines, particularly in cross-domain few-shot scenarios.
推断基因调控网络(GRNs)对于理解生物调控至关重要。尽管已经开发了许多用于GRN推理的深度学习方法,但大多数方法都需要大量的标记数据。我们提出了一种用于少量GRN推理的结构增强图元学习模型Meta-TGLink。通过将GRN推理制定为链接预测任务,Meta-TGLink捕获可转移的调节模式,同时减少对广泛标记数据集的依赖。该模型将图神经网络与Transformer体系结构相结合,集成了关系信息和位置信息,从而提高了数据稀缺条件下的预测性能。在真实数据集上的实验证明了它优于最先进的基线,特别是在跨域的少数镜头场景中。
{"title":"Structure-enhanced graph meta learning for few-shot gene regulatory network inference","authors":"Weiming Yu, Zhuobin Chen, Yaohua Hu, Jing Qin, Le Ou-Yang","doi":"10.1186/s13059-025-03860-8","DOIUrl":"https://doi.org/10.1186/s13059-025-03860-8","url":null,"abstract":"Inferring gene regulatory networks (GRNs) is essential for understanding biological regulation. Although numerous deep learning approaches have been developed for GRN inference, most require large amounts of labeled data. We present Meta-TGLink, a structure-enhanced graph meta-learning model for few-shot GRN inference. By formulating GRN inference as a link prediction task, Meta-TGLink captures transferable regulatory patterns while reducing dependence on extensive labeled datasets. The model combines graph neural networks with Transformer architectures to integrate relational and positional information, thereby improving predictive performance under data-scarce conditions. Experiments on real datasets demonstrate its superiority over state-of-the-art baselines, particularly in cross-domain few-shot scenarios.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"135 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145554407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Genome Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1