首页 > 最新文献

Genome research最新文献

英文 中文
Epigenetic and evolutionary features of ape subterminal heterochromatin. 类人猿亚末端异染色质的表观遗传和进化特征。
IF 5.5 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2026-01-05 DOI: 10.1101/gr.280987.125
DongAhn Yoo, Katherine M Munson, Evan E Eichler

Many African great ape chromosomes possess large subterminal heterochromatic caps at their telomeres that are conspicuously absent from the human lineage. Leveraging the complete sequences of great ape genomes, we characterize the organization of subterminal caps and reconstruct the evolutionary history of these regions in chimpanzees and gorillas. Detailed analyses of the composition of the associated terminal 32 bp satellite array from chimpanzee (termed pCht) and intervening segmental duplication (SD) spacers confirm two independent origins in the Pan and gorilla lineages. In chimpanzee and bonobo, we estimate these structures emerged ∼7.7 million years ago (MYA) in contrast to gorilla, in which they expanded more recently, ∼5.0 MYA, and now make up 8.5% of the total gorilla genome. In both lineages, the SD spacers punctuating the pCht heterochromatic satellite arrays correspond to pockets of decreased methylation, although in gorilla such regions are significantly less methylated (P < 2.2 × 10-16) than in chimpanzee or bonobo. Allelic pairs of subterminal caps show a higher degree of sequence divergence than euchromatic sequences, with bonobo showing less divergent haplotypes and less differentially methylated spacers. In contrast, we identify virtually identical subterminal caps mapping to nonhomologous chromosomes within a species, suggesting ectopic recombination potentially mediated by SD spacers. We find that the transition regions from heterochromatic subterminal caps to euchromatin are enriched for structural variant insertions and lineage-specific duplicated genes. Our findings suggest independent evolution of subterminal caps converging on a common genetic and epigenetic structure that promoted ectopic exchange as well as the emergence of novel genes at transition regions between euchromatin and heterochromatin.

许多非洲类人猿染色体在其端粒处具有大的亚末端异色帽,这在人类谱系中是明显不存在的。利用类人猿基因组的完整序列,我们刻画了黑猩猩和大猩猩亚末端帽的组织结构,并重建了这些区域的进化史。对黑猩猩相关末端32 bp卫星序列(称为pCht)和间隔片段重复(SD)间隔序列组成的详细分析证实了猩猩和大猩猩谱系中有两个独立的起源。在黑猩猩和倭黑猩猩中,我们估计这些结构出现在大约770万年前(MYA),而大猩猩则在大约5.0 MYA (MYA)之后才出现,现在占大猩猩总基因组的8.5%。在这两个谱系中,打断pCht异色卫星阵列的SD间隔区对应于甲基化减少的区域,尽管在大猩猩中,这些区域的甲基化程度明显较低
{"title":"Epigenetic and evolutionary features of ape subterminal heterochromatin.","authors":"DongAhn Yoo, Katherine M Munson, Evan E Eichler","doi":"10.1101/gr.280987.125","DOIUrl":"10.1101/gr.280987.125","url":null,"abstract":"<p><p>Many African great ape chromosomes possess large subterminal heterochromatic caps at their telomeres that are conspicuously absent from the human lineage. Leveraging the complete sequences of great ape genomes, we characterize the organization of subterminal caps and reconstruct the evolutionary history of these regions in chimpanzees and gorillas. Detailed analyses of the composition of the associated terminal 32 bp satellite array from chimpanzee (termed pCht) and intervening segmental duplication (SD) spacers confirm two independent origins in the <i>Pan</i> and gorilla lineages. In chimpanzee and bonobo, we estimate these structures emerged ∼7.7 million years ago (MYA) in contrast to gorilla, in which they expanded more recently, ∼5.0 MYA, and now make up 8.5% of the total gorilla genome. In both lineages, the SD spacers punctuating the pCht heterochromatic satellite arrays correspond to pockets of decreased methylation, although in gorilla such regions are significantly less methylated (<i>P</i> < 2.2 × 10<sup>-16</sup>) than in chimpanzee or bonobo. Allelic pairs of subterminal caps show a higher degree of sequence divergence than euchromatic sequences, with bonobo showing less divergent haplotypes and less differentially methylated spacers. In contrast, we identify virtually identical subterminal caps mapping to nonhomologous chromosomes within a species, suggesting ectopic recombination potentially mediated by SD spacers. We find that the transition regions from heterochromatic subterminal caps to euchromatin are enriched for structural variant insertions and lineage-specific duplicated genes. Our findings suggest independent evolution of subterminal caps converging on a common genetic and epigenetic structure that promoted ectopic exchange as well as the emergence of novel genes at transition regions between euchromatin and heterochromatin.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"38-49"},"PeriodicalIF":5.5,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12758386/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145344936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The superpowers of imprinting control regions. 印印控制区域的超能力。
IF 5.5 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2026-01-05 DOI: 10.1101/gr.281215.125
Bertille Montibus, Franck Court, Philippe Arnaud

Genomic imprinting is a specialized mechanism of transcriptional regulation whereby approximately 200 mammalian genes are expressed monoallelically according to their parental origin. This crucial developmental process is primarily controlled by discrete cis-regulatory elements known as imprinting control regions (ICRs), which play essential roles in directing allele-specific gene expression across large imprinted domains. In this review, we highlight the features that define ICRs as a distinct class of cis-regulatory regions, from their ability to maintain germline-inherited DNA methylation to their multifunctional roles in transcriptional control. For each imprinted domain, we examine the diverse mechanisms by which individual ICRs integrate multiple regulatory functions to coordinate both proximal and distal imprinted gene expression. By uncovering the multifaceted roles of ICRs, this review provides a compelling framework for understanding, more broadly, the molecular basis of finely controlled gene expression.

基因组印记是一种特殊的转录调控机制,约200种哺乳动物基因根据亲本来源单等位表达。这个关键的发育过程主要由被称为印迹控制区(ICRs)的离散顺式调控元件控制,ICRs在指导等位基因特异性基因在大印迹结构域的表达中起着重要作用。在这篇综述中,我们强调了将ICRs定义为一类独特的顺式调控区域的特征,从它们维持种系遗传DNA甲基化的能力到它们在转录控制中的多功能作用。对于每个印迹结构域,我们研究了个体ICRs整合多种调控功能以协调近端和远端印迹基因表达的不同机制。通过揭示ICRs的多方面作用,本综述为更广泛地理解精细控制基因表达的分子基础提供了一个令人信服的框架。
{"title":"The superpowers of imprinting control regions.","authors":"Bertille Montibus, Franck Court, Philippe Arnaud","doi":"10.1101/gr.281215.125","DOIUrl":"10.1101/gr.281215.125","url":null,"abstract":"<p><p>Genomic imprinting is a specialized mechanism of transcriptional regulation whereby approximately 200 mammalian genes are expressed monoallelically according to their parental origin. This crucial developmental process is primarily controlled by discrete <i>cis</i>-regulatory elements known as imprinting control regions (ICRs), which play essential roles in directing allele-specific gene expression across large imprinted domains. In this review, we highlight the features that define ICRs as a distinct class of <i>cis</i>-regulatory regions, from their ability to maintain germline-inherited DNA methylation to their multifunctional roles in transcriptional control. For each imprinted domain, we examine the diverse mechanisms by which individual ICRs integrate multiple regulatory functions to coordinate both proximal and distal imprinted gene expression. By uncovering the multifaceted roles of ICRs, this review provides a compelling framework for understanding, more broadly, the molecular basis of finely controlled gene expression.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"1-19"},"PeriodicalIF":5.5,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12758399/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145722442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A spectral component approach leveraging Identity-by-Descent graphs to address recent population structure in genomic analysis. 利用血统识别图的谱成分方法来解决基因组分析中最近的种群结构。
IF 5.5 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-12-23 DOI: 10.1101/gr.280659.125
Ruhollah Shemirani, Gillian M Belbin, Sinead Cullina, Christa Caggiano, Christopher Gignoux, Noah Zaitlen, Eimear Kenny

Population structure is a well-known confounder in statistical genetics, particularly in genome-wide association studies (GWAS), where it can lead to inflated test statistics and spurious associations. Traditional methods, such as principal components (PCs), commonly used to adjust for population structure, are limited in capturing fine-scale, nonlinear patterns that arise from recent demographic events - patterns that are crucial for understanding rare variant effects. To address this challenge, we propose a novel method called SPectral Components (SPCs), which leverages identity-by-descent (IBD) graphs to capture and transform local, nonlinear fine-scale population structure into continuous representations that can be seamlessly integrated into genetic analysis pipelines. Using both simulated datasets and empirical data from the UK Biobank (N ≈ 420,000), we demonstrate that SPCs outperform PCs in adjusting for fine-scale population structure. In simulations, SPCs explained over 90% of the fine-scale population structure with fewer components, while PCs captured less than 5%. In the UK Biobank, SPCs reduced the inflation of P-values in the GWAS of an environmental-driven phenotype by 12% compared to PCs, while maintaining a similar performance to PCs in height, a highly heritable phenotype. Additionally, SPCs improved rare variant association analyses, reducing genomic inflation (e.g., from 7.6 to 1.2 in one analysis), and provided more accurate heritability estimates. Spatial autocorrelation analysis further confirmed the ability of SPCs to account for environmental effects, reducing Moran's I for both environmental and heritable phenotypes more effectively than PCs. Overall, our findings demonstrate that SPCs provide a robust, scalable adjustment for recent population structure, offering a powerful alternative or complement to PCs in large-scale biobank studies.

人口结构是统计遗传学中一个众所周知的混杂因素,特别是在全基因组关联研究(GWAS)中,它可能导致夸大的测试统计数据和虚假的关联。通常用于调整人口结构的传统方法,如主成分(pc),在捕捉由最近的人口事件产生的精细尺度的非线性模式方面是有限的,而这些模式对于理解罕见的变异效应至关重要。为了应对这一挑战,我们提出了一种称为光谱分量(spc)的新方法,该方法利用血统识别(IBD)图来捕获局部非线性精细尺度种群结构并将其转换为连续表示,从而可以无缝地集成到遗传分析管道中。利用模拟数据集和来自UK Biobank (N≈420,000)的经验数据,我们证明了spc在调整精细尺度种群结构方面优于pc。在模拟中,spc用更少的组件解释了90%以上的精细尺度种群结构,而pc只占不到5%。在英国生物银行中,与pc相比,SPCs将环境驱动表型的GWAS中的p值膨胀降低了12%,同时在高度(高度可遗传的表型)方面保持与pc相似的表现。此外,SPCs改进了罕见变异关联分析,减少了基因组膨胀(例如,在一次分析中从7.6降至1.2),并提供了更准确的遗传力估计。空间自相关分析进一步证实了SPCs能够解释环境效应,比PCs更有效地降低环境和遗传表型的Moran’s I。总体而言,我们的研究结果表明,spc为最近的种群结构提供了强大的、可扩展的调整,在大规模生物库研究中为pc提供了强大的替代或补充。
{"title":"A spectral component approach leveraging Identity-by-Descent graphs to address recent population structure in genomic analysis.","authors":"Ruhollah Shemirani, Gillian M Belbin, Sinead Cullina, Christa Caggiano, Christopher Gignoux, Noah Zaitlen, Eimear Kenny","doi":"10.1101/gr.280659.125","DOIUrl":"10.1101/gr.280659.125","url":null,"abstract":"<p><p>Population structure is a well-known confounder in statistical genetics, particularly in genome-wide association studies (GWAS), where it can lead to inflated test statistics and spurious associations. Traditional methods, such as principal components (PCs), commonly used to adjust for population structure, are limited in capturing fine-scale, nonlinear patterns that arise from recent demographic events - patterns that are crucial for understanding rare variant effects. To address this challenge, we propose a novel method called SPectral Components (SPCs), which leverages identity-by-descent (IBD) graphs to capture and transform local, nonlinear fine-scale population structure into continuous representations that can be seamlessly integrated into genetic analysis pipelines. Using both simulated datasets and empirical data from the UK Biobank (N ≈ 420,000), we demonstrate that SPCs outperform PCs in adjusting for fine-scale population structure. In simulations, SPCs explained over 90% of the fine-scale population structure with fewer components, while PCs captured less than 5%. In the UK Biobank, SPCs reduced the inflation of <i>P</i>-values in the GWAS of an environmental-driven phenotype by 12% compared to PCs, while maintaining a similar performance to PCs in height, a highly heritable phenotype. Additionally, SPCs improved rare variant association analyses, reducing genomic inflation (e.g., from 7.6 to 1.2 in one analysis), and provided more accurate heritability estimates. Spatial autocorrelation analysis further confirmed the ability of SPCs to account for environmental effects, reducing Moran's I for both environmental and heritable phenotypes more effectively than PCs. Overall, our findings demonstrate that SPCs provide a robust, scalable adjustment for recent population structure, offering a powerful alternative or complement to PCs in large-scale biobank studies.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145818974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interpretable phenotype decoding from multicondition sequencing data with ALPINE. 可解释的表型解码从多条件测序数据与ALPINE。
IF 5.5 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-12-03 DOI: 10.1101/gr.280566.125
Wei-Hao Lee, Lechuan Li, Ruth Dannenfelser, Vicky Yao

As sequencing techniques advance in precision, affordability, and diversity, an abundance of heterogeneous sequencing data has become available, encompassing a wide range of phenotypic features and biological perturbations. Unfortunately, increased resolution comes with the cost of increased complexity of the biological search space, even at the individual study level, as perturbations are now often examined across many dimensions simultaneously, including different donor phenotypes, anatomical regions and cell types, and time points. Furthermore, broad integration across studies promises a unique opportunity to explore the molecular underpinnings of distinct healthy and disease states, larger than the original scope of the individual study. To fully realize the promise of both individual higher resolution studies and large cross-study integrations, we need a robust methodology that can disentangle the influence of technical and nonrelevant phenotypic factors, isolating relevant condition-specific signals from shared biological information while also providing interpretable insights into the genetic effects of these conditions. Current methods typically excel in only one of these areas. To address this gap, we have developed ALPINE, a supervised nonnegative matrix factorization (NMF) framework that effectively separates both technical and nontechnical factors while simultaneously offering direct interpretability of condition-associated genes. Through simulations across four different scenarios, we demonstrate that ALPINE outperforms existing methods in both isolating the effect of different phenotypic conditions and prioritizing condition-associated genes. Furthermore, ALPINE has favorable performance in batch effect removal compared with state-of-the-art integration methods. When applied to real-world case studies, we showcase how ALPINE can be used to extract insights into the biological mechanisms that underlie differences between phenotypic conditions.

随着测序技术在精度、可负担性和多样性方面的进步,大量的异质测序数据已经可用,包括广泛的表型特征和生物扰动。不幸的是,即使在个体研究水平上,分辨率的提高也伴随着生物搜索空间复杂性的增加,因为现在经常同时在多个维度上检查扰动,包括不同的:供体表型,解剖区域和细胞类型以及时间点。此外,跨研究的广泛整合为探索不同健康和疾病状态的分子基础提供了独特的机会,比原始的个体研究范围更大。为了充分实现个体高分辨率研究和大型交叉研究整合的希望,我们需要一种强大的方法,可以解开技术和非相关表型因素的影响,从共享的生物信息中分离出相关的特定条件信号,同时也为这些条件的遗传效应提供可解释的见解。目前的方法通常只在这些领域中的一个方面表现出色。为了解决这一差距,我们开发了ALPINE,这是一个有监督的非负矩阵分解(NMF)框架,可以有效地分离技术和非技术因素,同时提供条件相关基因的直接可解释性。通过对4种不同情况的模拟,我们证明ALPINE在分离不同表型条件的影响和优先考虑条件相关基因方面优于现有方法。此外,与最先进的集成方法相比,ALPINE在批次效应去除方面具有良好的性能。当应用于现实世界的案例研究时,我们展示了ALPINE如何用于提取对表型条件差异背后的生物学机制的见解。
{"title":"Interpretable phenotype decoding from multicondition sequencing data with ALPINE.","authors":"Wei-Hao Lee, Lechuan Li, Ruth Dannenfelser, Vicky Yao","doi":"10.1101/gr.280566.125","DOIUrl":"10.1101/gr.280566.125","url":null,"abstract":"<p><p>As sequencing techniques advance in precision, affordability, and diversity, an abundance of heterogeneous sequencing data has become available, encompassing a wide range of phenotypic features and biological perturbations. Unfortunately, increased resolution comes with the cost of increased complexity of the biological search space, even at the individual study level, as perturbations are now often examined across many dimensions simultaneously, including different donor phenotypes, anatomical regions and cell types, and time points. Furthermore, broad integration across studies promises a unique opportunity to explore the molecular underpinnings of distinct healthy and disease states, larger than the original scope of the individual study. To fully realize the promise of both individual higher resolution studies and large cross-study integrations, we need a robust methodology that can disentangle the influence of technical and nonrelevant phenotypic factors, isolating relevant condition-specific signals from shared biological information while also providing interpretable insights into the genetic effects of these conditions. Current methods typically excel in only one of these areas. To address this gap, we have developed ALPINE, a supervised nonnegative matrix factorization (NMF) framework that effectively separates both technical and nontechnical factors while simultaneously offering direct interpretability of condition-associated genes. Through simulations across four different scenarios, we demonstrate that ALPINE outperforms existing methods in both isolating the effect of different phenotypic conditions and prioritizing condition-associated genes. Furthermore, ALPINE has favorable performance in batch effect removal compared with state-of-the-art integration methods. When applied to real-world case studies, we showcase how ALPINE can be used to extract insights into the biological mechanisms that underlie differences between phenotypic conditions.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"2756-2769"},"PeriodicalIF":5.5,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12667713/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145344885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BayesRVAT enhances rare-variant association testing through Bayesian aggregation of functional annotations. BayesRVAT通过功能注释的贝叶斯聚合增强了罕见变量关联测试。
IF 5.5 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-12-03 DOI: 10.1101/gr.280689.125
Antonio Nappi, Liubov Shilova, Theofanis Karaletsos, Na Cai, Francesco Paolo Casale

Gene-level rare variant association tests (RVATs) are essential for uncovering disease mechanisms and identifying therapeutic targets. Advances in sequence-based machine learning have generated diverse variant pathogenicity scores, creating opportunities to improve RVATs. However, existing methods often rely on rigid models or single annotations, limiting their ability to leverage these advances. Here, we introduce BayesRVAT, a Bayesian rare variant association test that jointly models multiple annotations. By specifying priors on annotation effects and estimating gene- and trait-specific posterior burden scores, BayesRVAT flexibly captures diverse rare-variant architectures. In simulations, BayesRVAT improves power while maintaining calibration. In UK Biobank analyses, it detects 10.2% more blood-trait associations and reveals novel gene-disease links, including PRPH2 with retinal disease. Integrating BayesRVAT within omnibus frameworks further increases discoveries, demonstrating that flexible annotation modeling captures complementary signals beyond existing burden and variance-component tests.

基因水平罕见变异关联试验(RVATs)对于揭示疾病机制和确定治疗靶点至关重要。基于序列的机器学习的进步产生了不同的致病力评分,为提高rvat创造了机会。然而,现有的方法通常依赖于严格的模型或单个注释,限制了它们利用这些进步的能力。我们引入BayesRVAT,一个贝叶斯罕见变体关联测试,联合建模多个注释。BayesRVAT通过指定注释效应的先验和估计基因特异性后验负担得分,灵活地捕获各种罕见变异结构。在模拟中,BayesRVAT在保持校准的同时提高了功率。在英国生物银行的分析中,它检测到10.2%以上的血液特征关联,并揭示了新的基因疾病联系,包括PRPH2与视网膜疾病。在综合框架中集成BayesRVAT进一步增加了发现,证明了灵活的注释建模可以捕获超越现有负担和方差成分测试的互补信号。
{"title":"BayesRVAT enhances rare-variant association testing through Bayesian aggregation of functional annotations.","authors":"Antonio Nappi, Liubov Shilova, Theofanis Karaletsos, Na Cai, Francesco Paolo Casale","doi":"10.1101/gr.280689.125","DOIUrl":"10.1101/gr.280689.125","url":null,"abstract":"<p><p>Gene-level rare variant association tests (RVATs) are essential for uncovering disease mechanisms and identifying therapeutic targets. Advances in sequence-based machine learning have generated diverse variant pathogenicity scores, creating opportunities to improve RVATs. However, existing methods often rely on rigid models or single annotations, limiting their ability to leverage these advances. Here, we introduce BayesRVAT, a Bayesian rare variant association test that jointly models multiple annotations. By specifying priors on annotation effects and estimating gene- and trait-specific posterior burden scores, BayesRVAT flexibly captures diverse rare-variant architectures. In simulations, BayesRVAT improves power while maintaining calibration. In UK Biobank analyses, it detects 10.2% more blood-trait associations and reveals novel gene-disease links, including <i>PRPH2</i> with retinal disease. Integrating BayesRVAT within omnibus frameworks further increases discoveries, demonstrating that flexible annotation modeling captures complementary signals beyond existing burden and variance-component tests.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"2682-2690"},"PeriodicalIF":5.5,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12667389/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145367897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ML-MAGES enables multivariate genetic association analyses with genes and effect size shrinkage ml - mage使多变量遗传关联分析与基因和效应大小收缩
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-11-19 DOI: 10.1101/gr.280440.125
Xiran Liu, Lorin Crawford, Sohini Ramachandran
A fundamental goal of genetics is to identify which and how genetic variants are associated with a trait, often using the regression results from genome-wide association (GWA) studies. Important methodological challenges account for inflation in GWA effect estimates as well as in investigating more than one trait simultaneously. We leverage machine learning approaches for these two challenges, developing a computationally efficient method called ML-MAGES. First, we shrink the inflation in GWA effect sizes caused by nonindependence among variants using neural networks. We then cluster variant associations among multiple traits via variational inference. We compare the performance of shrinkage via neural networks to regularized regression and fine-mapping, two approaches used for addressing inflated effects but dealing with variants in focal regions of different sizes. Our neural network shrinkage outperforms both methods in approximating the true effect sizes in simulated data. Our infinite mixture clustering approach offers a flexible, data-driven way to distinguish different types of associations—trait-specific, shared across traits, or nonprioritized—among multiple traits based on their regularized effects. Clustering applied to our neural network shrinkage results also produces consistently higher precision and recall for distinguishing gene-level associations in simulations. We demonstrate the application of ML-MAGES on association analyses of two quantitative traits and two binary traits in the UK Biobank. Our identified associated genes from single-trait enrichment tests overlap with those having known relevant biological processes to the traits. Besides trait-specific associations, ML-MAGES identifies several variants with shared multitrait associations, suggesting putative shared genetic architecture.
遗传学的一个基本目标是确定哪些遗传变异与性状相关以及如何相关,通常使用全基因组关联(GWA)研究的回归结果。重要的方法挑战是考虑到GWA效应估计中的通货膨胀,以及同时调查多个特征。我们利用机器学习方法来应对这两个挑战,开发了一种计算效率高的方法,称为ml - mage。首先,我们利用神经网络缩小了由于变量之间不独立而导致的GWA效应大小膨胀。然后,我们通过变分推理聚类多个性状之间的变异关联。我们通过神经网络将收缩的性能与正则化回归和精细映射进行比较,这两种方法用于解决膨胀效应,但处理不同大小的焦点区域的变体。我们的神经网络收缩在模拟数据中逼近真实效应大小方面优于这两种方法。我们的无限混合聚类方法提供了一种灵活的、数据驱动的方式来区分不同类型的关联——特定于性状的、跨性状共享的或非优先级的——基于它们的正则化效应。聚类应用于我们的神经网络收缩结果也产生一致的更高的精度和召回,以区分模拟中的基因水平关联。我们展示了ml - mage在UK Biobank中两个数量性状和两个二元性状的关联分析中的应用。我们从单性状富集测试中发现的相关基因与那些已知与性状相关的生物学过程的基因重叠。除了性状特异性关联外,ml - mage还发现了几个具有共同多性状关联的变异,这表明可能存在共同的遗传结构。
{"title":"ML-MAGES enables multivariate genetic association analyses with genes and effect size shrinkage","authors":"Xiran Liu, Lorin Crawford, Sohini Ramachandran","doi":"10.1101/gr.280440.125","DOIUrl":"https://doi.org/10.1101/gr.280440.125","url":null,"abstract":"A fundamental goal of genetics is to identify which and how genetic variants are associated with a trait, often using the regression results from genome-wide association (GWA) studies. Important methodological challenges account for inflation in GWA effect estimates as well as in investigating more than one trait simultaneously. We leverage machine learning approaches for these two challenges, developing a computationally efficient method called ML-MAGES. First, we shrink the inflation in GWA effect sizes caused by nonindependence among variants using neural networks. We then cluster variant associations among multiple traits via variational inference. We compare the performance of shrinkage via neural networks to regularized regression and fine-mapping, two approaches used for addressing inflated effects but dealing with variants in focal regions of different sizes. Our neural network shrinkage outperforms both methods in approximating the true effect sizes in simulated data. Our infinite mixture clustering approach offers a flexible, data-driven way to distinguish different types of associations—trait-specific, shared across traits, or nonprioritized—among multiple traits based on their regularized effects. Clustering applied to our neural network shrinkage results also produces consistently higher precision and recall for distinguishing gene-level associations in simulations. We demonstrate the application of ML-MAGES on association analyses of two quantitative traits and two binary traits in the UK Biobank. Our identified associated genes from single-trait enrichment tests overlap with those having known relevant biological processes to the traits. Besides trait-specific associations, ML-MAGES identifies several variants with shared multitrait associations, suggesting putative shared genetic architecture.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"19 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145553433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integration of high-throughput proteomic data and complementary omics layers with PriOmics 与PriOmics整合高通量蛋白质组学数据和互补组学层
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-11-18 DOI: 10.1101/gr.279487.124
Robin Kosch, Katharina Limm, Annette M. Staiger, Nadine S. Kurz, Nicole Seifert, Bence Oláh, Stefan Solbrig, Viola Poeschel, Gerhard Held, Marita Ziepert, Norbert Schmitz, Emil Chteinberg, Reiner Siebert, Rainer Spang, Helena U. Zacharias, German Ott, Peter J. Oefner, Michael Altenbuchinger
High-throughput bottom-up proteomics data cover 1,000s of proteins and related co- and post-translational modifications (CTMs/PTMs). Yet, it remains an open question how to holistically explore such data and their relationship to complementary omics/phenotypical information. Graphical models are particularly suited to study molecular networks and underlying regulatory mechanisms, as they can distinguish direct from indirect relationships, aside from their generalizability to diverse data types. We propose PriOmics to integrate proteomics data with complementary omics and phenotypical data. PriOmics models intensities of individual proteotypic peptides and incorporates their protein affiliation as prior knowledge to resolve statistical relationships between proteins and CTMs/PTMs. This was verified in simulation studies, which also demonstrate that PriOmics can disentangle regulatory effects of protein modifications from those of respective protein abundances. These findings were substantiated in a Diffuse Large B-Cell Lymphoma (DLBCL) dataset where we integrated SWATH-MS-based proteomics with transcriptomic and phenotypic data.
高通量自下而上的蛋白质组学数据涵盖了1000种蛋白质及其相关的共翻译修饰和翻译后修饰(CTMs/PTMs)。然而,如何全面探索这些数据及其与互补组学/表型信息的关系仍然是一个悬而未决的问题。图形模型特别适合研究分子网络和潜在的调节机制,因为它们可以区分直接关系和间接关系,以及它们对不同数据类型的通用性。我们提出PriOmics整合蛋白质组学数据与互补组学和表型数据。PriOmics对单个蛋白型肽的强度进行建模,并将其蛋白质关联作为先验知识来解决蛋白质与CTMs/PTMs之间的统计关系。这在模拟研究中得到了验证,这也表明PriOmics可以将蛋白质修饰的调节作用与相应蛋白质丰度的调节作用分开。这些发现在弥漫性大b细胞淋巴瘤(DLBCL)数据集中得到证实,我们将基于swath - ms的蛋白质组学与转录组学和表型数据相结合。
{"title":"Integration of high-throughput proteomic data and complementary omics layers with PriOmics","authors":"Robin Kosch, Katharina Limm, Annette M. Staiger, Nadine S. Kurz, Nicole Seifert, Bence Oláh, Stefan Solbrig, Viola Poeschel, Gerhard Held, Marita Ziepert, Norbert Schmitz, Emil Chteinberg, Reiner Siebert, Rainer Spang, Helena U. Zacharias, German Ott, Peter J. Oefner, Michael Altenbuchinger","doi":"10.1101/gr.279487.124","DOIUrl":"https://doi.org/10.1101/gr.279487.124","url":null,"abstract":"High-throughput bottom-up proteomics data cover 1,000s of proteins and related co- and post-translational modifications (CTMs/PTMs). Yet, it remains an open question how to holistically explore such data and their relationship to complementary omics/phenotypical information. Graphical models are particularly suited to study molecular networks and underlying regulatory mechanisms, as they can distinguish direct from indirect relationships, aside from their generalizability to diverse data types. We propose PriOmics to integrate proteomics data with complementary omics and phenotypical data. PriOmics models intensities of individual proteotypic peptides and incorporates their protein affiliation as prior knowledge to resolve statistical relationships between proteins and CTMs/PTMs. This was verified in simulation studies, which also demonstrate that PriOmics can disentangle regulatory effects of protein modifications from those of respective protein abundances. These findings were substantiated in a Diffuse Large B-Cell Lymphoma (DLBCL) dataset where we integrated SWATH-MS-based proteomics with transcriptomic and phenotypic data.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"32 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145545793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Polycomb misregulation in enterocytes drives tissue decline in the aging Drosophila intestine 肠道细胞的多梳性失调驱动衰老果蝇肠道的组织衰退
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-11-17 DOI: 10.1101/gr.281058.125
Sarah Leichter, Kami Ahmad, Steve Henikoff
Aging compromises intestinal integrity, yet the chromatin changes driving this decline remain unclear. Polycomb-mediated repression is essential for silencing developmental genes, but this regulatory mechanism becomes dysregulated with age. Although shifts in Polycomb regulation within intestinal stem cells have been linked to gut aging, the Polycomb landscape of differentiated cell types remains unexplored. Differentiated cells comprise the majority of the gut epithelium and directly impact both tissue and whole organismal aging. Using single-cell chromatin profiling of the Drosophila intestine, we identify cell type-specific chromatin landscape changes during aging. We find that old enterocytes aberrantly repress genes essential for transmembrane transport and chitin metabolism, contributing to intestinal barrier decline – an example of antagonistic pleiotropy in a regenerative tissue. Barrier decline leads to derepression of JAK/STAT ligands in all cell types and increased proliferation of aging stem cells, with elevated RNA Polymerase II (RNAPII) at S-phase-dependent histone genes. Specific upregulation of histone genes during aging stem cell proliferation resembles RNAPII hypertranscription of histone genes in aggressive human cancers. Our work reveals that misregulation of the Polycomb-mediated H3K27me3 histone modification in differentiated cells during aging not only underlies tissue decline but also mirrors transcriptional changes in cancer, suggesting a common mechanism linking aging and cancer progression.
衰老损害了肠道的完整性,然而导致这种下降的染色质变化尚不清楚。polycomb介导的抑制对于沉默发育基因至关重要,但这种调节机制随着年龄的增长而失调。尽管肠道干细胞内Polycomb调节的变化与肠道衰老有关,但分化细胞类型的Polycomb景观仍未被探索。分化细胞占肠上皮的大部分,并直接影响组织和整个机体的衰老。利用果蝇肠道的单细胞染色质谱,我们确定了衰老过程中细胞类型特异性染色质景观的变化。我们发现年老的肠细胞异常地抑制跨膜运输和几丁质代谢所必需的基因,导致肠屏障下降——这是再生组织中拮抗多效性的一个例子。屏障下降导致所有细胞类型中JAK/STAT配体的抑制,衰老干细胞的增殖增加,s期依赖组蛋白基因的RNA聚合酶II (RNAPII)升高。衰老干细胞增殖过程中组蛋白基因的特异性上调类似于侵袭性人类癌症中组蛋白基因的RNAPII超转录。我们的研究表明,在衰老过程中分化细胞中polycomb介导的H3K27me3组蛋白修饰的错误调节不仅是组织衰退的基础,而且反映了癌症的转录变化,这表明衰老和癌症进展之间存在共同的机制。
{"title":"Polycomb misregulation in enterocytes drives tissue decline in the aging Drosophila intestine","authors":"Sarah Leichter, Kami Ahmad, Steve Henikoff","doi":"10.1101/gr.281058.125","DOIUrl":"https://doi.org/10.1101/gr.281058.125","url":null,"abstract":"Aging compromises intestinal integrity, yet the chromatin changes driving this decline remain unclear. Polycomb-mediated repression is essential for silencing developmental genes, but this regulatory mechanism becomes dysregulated with age. Although shifts in Polycomb regulation within intestinal stem cells have been linked to gut aging, the Polycomb landscape of differentiated cell types remains unexplored. Differentiated cells comprise the majority of the gut epithelium and directly impact both tissue and whole organismal aging. Using single-cell chromatin profiling of the <em>Drosophila</em> intestine, we identify cell type-specific chromatin landscape changes during aging. We find that old enterocytes aberrantly repress genes essential for transmembrane transport and chitin metabolism, contributing to intestinal barrier decline – an example of antagonistic pleiotropy in a regenerative tissue. Barrier decline leads to derepression of JAK/STAT ligands in all cell types and increased proliferation of aging stem cells, with elevated RNA Polymerase II (RNAPII) at S-phase-dependent histone genes. Specific upregulation of histone genes during aging stem cell proliferation resembles RNAPII hypertranscription of histone genes in aggressive human cancers. Our work reveals that misregulation of the Polycomb-mediated H3K27me3 histone modification in differentiated cells during aging not only underlies tissue decline but also mirrors transcriptional changes in cancer, suggesting a common mechanism linking aging and cancer progression.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"7 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145536191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint imputation and deconvolution of gene expression across spatial transcriptomics platforms 跨空间转录组学平台的基因表达联合植入和反卷积
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-11-17 DOI: 10.1101/gr.280555.125
Hongyu Zheng, Hirak Sarkar, Benjamin J. Raphael
Spatially resolved transcriptomics (SRT) technologies measure gene expression across thousands of spatial locations within a tissue slice. Multiple SRT technologies are currently available and others are in active development, with each technology having varying spatial resolution (subcellular, single-cell, or multicellular regions), gene coverage (targeted vs. whole-transcriptome), and sequencing depth per location. For example, the widely used 10x Genomics Visium platform measures whole transcriptomes from multiple-cell-sized spots, whereas the 10x Genomics Xenium platform measures a few hundred genes at subcellular resolution. A number of studies apply multiple SRT technologies to slices that originate from the same biological tissue. Integration of data from different SRT technologies can overcome limitations of the individual technologies, enabling the imputation of expression from unmeasured genes in targeted technologies and/or the deconvolution of admixed expression from technologies with lower spatial resolution. Here, we introduce Spatial Integration for Imputation and Deconvolution (SIID), an algorithm to reconstruct a latent spatial gene expression matrix from a pair of observations from different SRT technologies. SIID leverages a spatial alignment and uses a joint nonnegative factorization model to accurately impute missing gene expression and infer gene expression signatures of cell types from admixed SRT data. In simulations involving paired SRT data sets from different technologies (e.g., Xenium and Visium), SIID shows superior performance in reconstructing spot-to-cell-type assignments, recovering cell type–specific gene expression and imputing missing data compared to contemporary tools. When applied to real-world 10x Xenium-Visium pairs from human breast and colon cancer tissues, SIID achieves highest performance in imputing holdout gene expression.
空间分辨转录组学(SRT)技术测量组织切片内数千个空间位置的基因表达。目前有多种SRT技术可用,其他技术正在积极开发中,每种技术具有不同的空间分辨率(亚细胞,单细胞或多细胞区域),基因覆盖范围(靶向与全转录组)以及每个位置的测序深度。例如,广泛使用的10x Genomics Visium平台从多个细胞大小的点测量整个转录组,而10x Genomics Xenium平台在亚细胞分辨率上测量几百个基因。许多研究将多种SRT技术应用于来自同一生物组织的切片。整合来自不同SRT技术的数据可以克服单个技术的局限性,从而可以在目标技术中插入未测量基因的表达和/或从较低空间分辨率的技术中反卷积混合表达。在这里,我们介绍了用于Imputation和Deconvolution (SIID)的空间集成,这是一种从不同SRT技术的一对观测数据中重建潜在空间基因表达矩阵的算法。SIID利用空间比对和联合非负因子分解模型,从混合的SRT数据中准确地推算缺失的基因表达和推断细胞类型的基因表达特征。在涉及来自不同技术(例如Xenium和Visium)的配对SRT数据集的模拟中,与当代工具相比,SIID在重建点到细胞类型分配、恢复细胞类型特异性基因表达和输入缺失数据方面表现出优越的性能。当应用于来自人类乳腺癌和结肠癌组织的真实世界的10倍Xenium-Visium对时,SIID在计算holdout基因表达方面达到了最高的性能。
{"title":"Joint imputation and deconvolution of gene expression across spatial transcriptomics platforms","authors":"Hongyu Zheng, Hirak Sarkar, Benjamin J. Raphael","doi":"10.1101/gr.280555.125","DOIUrl":"https://doi.org/10.1101/gr.280555.125","url":null,"abstract":"Spatially resolved transcriptomics (SRT) technologies measure gene expression across thousands of spatial locations within a tissue slice. Multiple SRT technologies are currently available and others are in active development, with each technology having varying spatial resolution (subcellular, single-cell, or multicellular regions), gene coverage (targeted vs. whole-transcriptome), and sequencing depth per location. For example, the widely used 10x Genomics Visium platform measures whole transcriptomes from multiple-cell-sized spots, whereas the 10x Genomics Xenium platform measures a few hundred genes at subcellular resolution. A number of studies apply multiple SRT technologies to slices that originate from the same biological tissue. Integration of data from different SRT technologies can overcome limitations of the individual technologies, enabling the imputation of expression from unmeasured genes in targeted technologies and/or the deconvolution of admixed expression from technologies with lower spatial resolution. Here, we introduce Spatial Integration for Imputation and Deconvolution (SIID), an algorithm to reconstruct a latent spatial gene expression matrix from a pair of observations from different SRT technologies. SIID leverages a spatial alignment and uses a joint nonnegative factorization model to accurately impute missing gene expression and infer gene expression signatures of cell types from admixed SRT data. In simulations involving paired SRT data sets from different technologies (e.g., Xenium and Visium), SIID shows superior performance in reconstructing spot-to-cell-type assignments, recovering cell type–specific gene expression and imputing missing data compared to contemporary tools. When applied to real-world 10x Xenium-Visium pairs from human breast and colon cancer tissues, SIID achieves highest performance in imputing holdout gene expression.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"3 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145536177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OMKar automates genome karyotyping using optical maps to identify constitutional abnormalities OMKar自动化基因组核型,使用光学图来识别体质异常
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-11-14 DOI: 10.1101/gr.280536.125
Siavash Raeisi Dehkordi, Zhaoyang Jia, Joey Estabrook, Jen Hauenstein, Neil Miller, Naz Güleray-Lafci, Jürgen Neesen, Alex Hastie, Alka Chaubey, Andy Wing Chun Pang, Paul Dremsek, Vineet Bafna
The whole-genome karyotype refers to the sequence of large chromosomal segments comprising an individual's genotype. Karyotype analysis, which includes identifying aneuploidies and structural rearrangements, is essential for understanding genetic risk factors, informing diagnosis and treatment, and guiding genetic counseling in constitutional disorders. The current karyotyping standard relies on microscopic chromosome examination, a complex and expertise-dependent process with megabase-scale resolution. Optical genome mapping (OGM) technology offers an efficient approach to detect large-scale genomic lesions. Here, we introduce OMKar, a computational method that generates virtual karyotypes from OGM data. OMKar integrates structural variants (SVs) and copy number (CN) variants into a breakpoint graph representation. It re-estimates CNs using integer linear programming to enforce CN balance and then identifies constrained Eulerian paths corresponding to full chromosome structures. OMKar is evaluated on 38 whole-genome simulations of constitutional disorders, achieving 88% precision and 95% recall for SV concordance and a 95% Jaccard score for CN concordance. We further apply OMKar to 154 clinical samples including 50 prenatal, 41 postnatal, and 63 parental genomes collected across 10 sites. It correctly reconstructs the karyotype in 144 cases, including 25 of 25 aneuploidies, 32 of 32 balanced translocations, and 72 of 82 unbalanced rearrangements. Identified disorders include cri-du-chat, Wolf–Hirschhorn, Prader–Willi, Down, and Turner syndromes. Notably, OMKar uncovers plausible genetic mechanisms in five previously unexplained cases. These results demonstrate the accuracy and utility of OMKar for OGM-based constitutional karyotyping.
全基因组核型是指包含个体基因型的大染色体片段的序列。核型分析,包括识别非整倍体和结构重排,对于了解遗传风险因素,为诊断和治疗提供信息,并指导体质疾病的遗传咨询至关重要。目前的核型标准依赖于显微镜下的染色体检查,这是一个复杂的、依赖专业知识的过程,具有超大规模的分辨率。光学基因组定位(OGM)技术提供了检测大规模基因组病变的有效方法。在这里,我们介绍了OMKar,一种从OGM数据生成虚拟核型的计算方法。OMKar将结构变量(SVs)和拷贝数(CN)变量集成到一个断点图表示中。该算法利用整数线性规划对神经网络进行重新估计,实现神经网络平衡,然后识别出与完整染色体结构相对应的约束欧拉路径。OMKar在38个体质疾病的全基因组模拟中进行了评估,SV一致性达到88%的准确率和95%的召回率,CN一致性达到95%的Jaccard评分。我们进一步将OMKar应用于154个临床样本,包括来自10个地点的50个产前、41个产后和63个亲本基因组。它正确地重建了144例的核型,包括25个非整倍体中的25个,32个平衡易位中的32个,82个不平衡重排中的72个。已确定的疾病包括cri-du-chat, Wolf-Hirschhorn, Prader-Willi, Down和Turner综合征。值得注意的是,OMKar在五个先前无法解释的病例中揭示了看似合理的遗传机制。这些结果证明了OMKar在基于ogm的体质核型分析中的准确性和实用性。
{"title":"OMKar automates genome karyotyping using optical maps to identify constitutional abnormalities","authors":"Siavash Raeisi Dehkordi, Zhaoyang Jia, Joey Estabrook, Jen Hauenstein, Neil Miller, Naz Güleray-Lafci, Jürgen Neesen, Alex Hastie, Alka Chaubey, Andy Wing Chun Pang, Paul Dremsek, Vineet Bafna","doi":"10.1101/gr.280536.125","DOIUrl":"https://doi.org/10.1101/gr.280536.125","url":null,"abstract":"The whole-genome karyotype refers to the sequence of large chromosomal segments comprising an individual's genotype. Karyotype analysis, which includes identifying aneuploidies and structural rearrangements, is essential for understanding genetic risk factors, informing diagnosis and treatment, and guiding genetic counseling in constitutional disorders. The current karyotyping standard relies on microscopic chromosome examination, a complex and expertise-dependent process with megabase-scale resolution. Optical genome mapping (OGM) technology offers an efficient approach to detect large-scale genomic lesions. Here, we introduce OMKar, a computational method that generates virtual karyotypes from OGM data. OMKar integrates structural variants (SVs) and copy number (CN) variants into a breakpoint graph representation. It re-estimates CNs using integer linear programming to enforce CN balance and then identifies constrained Eulerian paths corresponding to full chromosome structures. OMKar is evaluated on 38 whole-genome simulations of constitutional disorders, achieving 88% precision and 95% recall for SV concordance and a 95% Jaccard score for CN concordance. We further apply OMKar to 154 clinical samples including 50 prenatal, 41 postnatal, and 63 parental genomes collected across 10 sites. It correctly reconstructs the karyotype in 144 cases, including 25 of 25 aneuploidies, 32 of 32 balanced translocations, and 72 of 82 unbalanced rearrangements. Identified disorders include cri-du-chat, Wolf–Hirschhorn, Prader–Willi, Down, and Turner syndromes. Notably, OMKar uncovers plausible genetic mechanisms in five previously unexplained cases. These results demonstrate the accuracy and utility of OMKar for OGM-based constitutional karyotyping.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"11 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145515801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Genome research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1