Genome research最新文献_第5页

ML-MAGES enables multivariate genetic association analyses with genes and effect size shrinkage ml - mage使多变量遗传关联分析与基因和效应大小收缩

IF 7 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2025-11-19 DOI: 10.1101/gr.280440.125

Xiran Liu, Lorin Crawford, Sohini Ramachandran

A fundamental goal of genetics is to identify which and how genetic variants are associated with a trait, often using the regression results from genome-wide association (GWA) studies. Important methodological challenges account for inflation in GWA effect estimates as well as in investigating more than one trait simultaneously. We leverage machine learning approaches for these two challenges, developing a computationally efficient method called ML-MAGES. First, we shrink the inflation in GWA effect sizes caused by nonindependence among variants using neural networks. We then cluster variant associations among multiple traits via variational inference. We compare the performance of shrinkage via neural networks to regularized regression and fine-mapping, two approaches used for addressing inflated effects but dealing with variants in focal regions of different sizes. Our neural network shrinkage outperforms both methods in approximating the true effect sizes in simulated data. Our infinite mixture clustering approach offers a flexible, data-driven way to distinguish different types of associations—trait-specific, shared across traits, or nonprioritized—among multiple traits based on their regularized effects. Clustering applied to our neural network shrinkage results also produces consistently higher precision and recall for distinguishing gene-level associations in simulations. We demonstrate the application of ML-MAGES on association analyses of two quantitative traits and two binary traits in the UK Biobank. Our identified associated genes from single-trait enrichment tests overlap with those having known relevant biological processes to the traits. Besides trait-specific associations, ML-MAGES identifies several variants with shared multitrait associations, suggesting putative shared genetic architecture.

遗传学的一个基本目标是确定哪些遗传变异与性状相关以及如何相关，通常使用全基因组关联（GWA）研究的回归结果。重要的方法挑战是考虑到GWA效应估计中的通货膨胀，以及同时调查多个特征。我们利用机器学习方法来应对这两个挑战，开发了一种计算效率高的方法，称为ml - mage。首先，我们利用神经网络缩小了由于变量之间不独立而导致的GWA效应大小膨胀。然后，我们通过变分推理聚类多个性状之间的变异关联。我们通过神经网络将收缩的性能与正则化回归和精细映射进行比较，这两种方法用于解决膨胀效应，但处理不同大小的焦点区域的变体。我们的神经网络收缩在模拟数据中逼近真实效应大小方面优于这两种方法。我们的无限混合聚类方法提供了一种灵活的、数据驱动的方式来区分不同类型的关联——特定于性状的、跨性状共享的或非优先级的——基于它们的正则化效应。聚类应用于我们的神经网络收缩结果也产生一致的更高的精度和召回，以区分模拟中的基因水平关联。我们展示了ml - mage在UK Biobank中两个数量性状和两个二元性状的关联分析中的应用。我们从单性状富集测试中发现的相关基因与那些已知与性状相关的生物学过程的基因重叠。除了性状特异性关联外，ml - mage还发现了几个具有共同多性状关联的变异，这表明可能存在共同的遗传结构。

{"title":"ML-MAGES enables multivariate genetic association analyses with genes and effect size shrinkage","authors":"Xiran Liu, Lorin Crawford, Sohini Ramachandran","doi":"10.1101/gr.280440.125","DOIUrl":"https://doi.org/10.1101/gr.280440.125","url":null,"abstract":"A fundamental goal of genetics is to identify which and how genetic variants are associated with a trait, often using the regression results from genome-wide association (GWA) studies. Important methodological challenges account for inflation in GWA effect estimates as well as in investigating more than one trait simultaneously. We leverage machine learning approaches for these two challenges, developing a computationally efficient method called ML-MAGES. First, we shrink the inflation in GWA effect sizes caused by nonindependence among variants using neural networks. We then cluster variant associations among multiple traits via variational inference. We compare the performance of shrinkage via neural networks to regularized regression and fine-mapping, two approaches used for addressing inflated effects but dealing with variants in focal regions of different sizes. Our neural network shrinkage outperforms both methods in approximating the true effect sizes in simulated data. Our infinite mixture clustering approach offers a flexible, data-driven way to distinguish different types of associations—trait-specific, shared across traits, or nonprioritized—among multiple traits based on their regularized effects. Clustering applied to our neural network shrinkage results also produces consistently higher precision and recall for distinguishing gene-level associations in simulations. We demonstrate the application of ML-MAGES on association analyses of two quantitative traits and two binary traits in the UK Biobank. Our identified associated genes from single-trait enrichment tests overlap with those having known relevant biological processes to the traits. Besides trait-specific associations, ML-MAGES identifies several variants with shared multitrait associations, suggesting putative shared genetic architecture.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"19 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145553433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Integration of high-throughput proteomic data and complementary omics layers with PriOmics 与PriOmics整合高通量蛋白质组学数据和互补组学层

IF 7 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2025-11-18 DOI: 10.1101/gr.279487.124

Robin Kosch, Katharina Limm, Annette M. Staiger, Nadine S. Kurz, Nicole Seifert, Bence Oláh, Stefan Solbrig, Viola Poeschel, Gerhard Held, Marita Ziepert, Norbert Schmitz, Emil Chteinberg, Reiner Siebert, Rainer Spang, Helena U. Zacharias, German Ott, Peter J. Oefner, Michael Altenbuchinger

High-throughput bottom-up proteomics data cover 1,000s of proteins and related co- and post-translational modifications (CTMs/PTMs). Yet, it remains an open question how to holistically explore such data and their relationship to complementary omics/phenotypical information. Graphical models are particularly suited to study molecular networks and underlying regulatory mechanisms, as they can distinguish direct from indirect relationships, aside from their generalizability to diverse data types. We propose PriOmics to integrate proteomics data with complementary omics and phenotypical data. PriOmics models intensities of individual proteotypic peptides and incorporates their protein affiliation as prior knowledge to resolve statistical relationships between proteins and CTMs/PTMs. This was verified in simulation studies, which also demonstrate that PriOmics can disentangle regulatory effects of protein modifications from those of respective protein abundances. These findings were substantiated in a Diffuse Large B-Cell Lymphoma (DLBCL) dataset where we integrated SWATH-MS-based proteomics with transcriptomic and phenotypic data.

高通量自下而上的蛋白质组学数据涵盖了1000种蛋白质及其相关的共翻译修饰和翻译后修饰（CTMs/PTMs）。然而，如何全面探索这些数据及其与互补组学/表型信息的关系仍然是一个悬而未决的问题。图形模型特别适合研究分子网络和潜在的调节机制，因为它们可以区分直接关系和间接关系，以及它们对不同数据类型的通用性。我们提出PriOmics整合蛋白质组学数据与互补组学和表型数据。PriOmics对单个蛋白型肽的强度进行建模，并将其蛋白质关联作为先验知识来解决蛋白质与CTMs/PTMs之间的统计关系。这在模拟研究中得到了验证，这也表明PriOmics可以将蛋白质修饰的调节作用与相应蛋白质丰度的调节作用分开。这些发现在弥漫性大b细胞淋巴瘤（DLBCL）数据集中得到证实，我们将基于swath - ms的蛋白质组学与转录组学和表型数据相结合。

{"title":"Integration of high-throughput proteomic data and complementary omics layers with PriOmics","authors":"Robin Kosch, Katharina Limm, Annette M. Staiger, Nadine S. Kurz, Nicole Seifert, Bence Oláh, Stefan Solbrig, Viola Poeschel, Gerhard Held, Marita Ziepert, Norbert Schmitz, Emil Chteinberg, Reiner Siebert, Rainer Spang, Helena U. Zacharias, German Ott, Peter J. Oefner, Michael Altenbuchinger","doi":"10.1101/gr.279487.124","DOIUrl":"https://doi.org/10.1101/gr.279487.124","url":null,"abstract":"High-throughput bottom-up proteomics data cover 1,000s of proteins and related co- and post-translational modifications (CTMs/PTMs). Yet, it remains an open question how to holistically explore such data and their relationship to complementary omics/phenotypical information. Graphical models are particularly suited to study molecular networks and underlying regulatory mechanisms, as they can distinguish direct from indirect relationships, aside from their generalizability to diverse data types. We propose PriOmics to integrate proteomics data with complementary omics and phenotypical data. PriOmics models intensities of individual proteotypic peptides and incorporates their protein affiliation as prior knowledge to resolve statistical relationships between proteins and CTMs/PTMs. This was verified in simulation studies, which also demonstrate that PriOmics can disentangle regulatory effects of protein modifications from those of respective protein abundances. These findings were substantiated in a Diffuse Large B-Cell Lymphoma (DLBCL) dataset where we integrated SWATH-MS-based proteomics with transcriptomic and phenotypic data.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"32 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145545793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Polycomb misregulation in enterocytes drives tissue decline in the aging Drosophila intestine 肠道细胞的多梳性失调驱动衰老果蝇肠道的组织衰退

IF 7 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2025-11-17 DOI: 10.1101/gr.281058.125

Sarah Leichter, Kami Ahmad, Steve Henikoff

Aging compromises intestinal integrity, yet the chromatin changes driving this decline remain unclear. Polycomb-mediated repression is essential for silencing developmental genes, but this regulatory mechanism becomes dysregulated with age. Although shifts in Polycomb regulation within intestinal stem cells have been linked to gut aging, the Polycomb landscape of differentiated cell types remains unexplored. Differentiated cells comprise the majority of the gut epithelium and directly impact both tissue and whole organismal aging. Using single-cell chromatin profiling of the Drosophila intestine, we identify cell type-specific chromatin landscape changes during aging. We find that old enterocytes aberrantly repress genes essential for transmembrane transport and chitin metabolism, contributing to intestinal barrier decline – an example of antagonistic pleiotropy in a regenerative tissue. Barrier decline leads to derepression of JAK/STAT ligands in all cell types and increased proliferation of aging stem cells, with elevated RNA Polymerase II (RNAPII) at S-phase-dependent histone genes. Specific upregulation of histone genes during aging stem cell proliferation resembles RNAPII hypertranscription of histone genes in aggressive human cancers. Our work reveals that misregulation of the Polycomb-mediated H3K27me3 histone modification in differentiated cells during aging not only underlies tissue decline but also mirrors transcriptional changes in cancer, suggesting a common mechanism linking aging and cancer progression.

衰老损害了肠道的完整性，然而导致这种下降的染色质变化尚不清楚。polycomb介导的抑制对于沉默发育基因至关重要，但这种调节机制随着年龄的增长而失调。尽管肠道干细胞内Polycomb调节的变化与肠道衰老有关，但分化细胞类型的Polycomb景观仍未被探索。分化细胞占肠上皮的大部分，并直接影响组织和整个机体的衰老。利用果蝇肠道的单细胞染色质谱，我们确定了衰老过程中细胞类型特异性染色质景观的变化。我们发现年老的肠细胞异常地抑制跨膜运输和几丁质代谢所必需的基因，导致肠屏障下降——这是再生组织中拮抗多效性的一个例子。屏障下降导致所有细胞类型中JAK/STAT配体的抑制，衰老干细胞的增殖增加，s期依赖组蛋白基因的RNA聚合酶II （RNAPII）升高。衰老干细胞增殖过程中组蛋白基因的特异性上调类似于侵袭性人类癌症中组蛋白基因的RNAPII超转录。我们的研究表明，在衰老过程中分化细胞中polycomb介导的H3K27me3组蛋白修饰的错误调节不仅是组织衰退的基础，而且反映了癌症的转录变化，这表明衰老和癌症进展之间存在共同的机制。

{"title":"Polycomb misregulation in enterocytes drives tissue decline in the aging Drosophila intestine","authors":"Sarah Leichter, Kami Ahmad, Steve Henikoff","doi":"10.1101/gr.281058.125","DOIUrl":"https://doi.org/10.1101/gr.281058.125","url":null,"abstract":"Aging compromises intestinal integrity, yet the chromatin changes driving this decline remain unclear. Polycomb-mediated repression is essential for silencing developmental genes, but this regulatory mechanism becomes dysregulated with age. Although shifts in Polycomb regulation within intestinal stem cells have been linked to gut aging, the Polycomb landscape of differentiated cell types remains unexplored. Differentiated cells comprise the majority of the gut epithelium and directly impact both tissue and whole organismal aging. Using single-cell chromatin profiling of the Drosophila intestine, we identify cell type-specific chromatin landscape changes during aging. We find that old enterocytes aberrantly repress genes essential for transmembrane transport and chitin metabolism, contributing to intestinal barrier decline – an example of antagonistic pleiotropy in a regenerative tissue. Barrier decline leads to derepression of JAK/STAT ligands in all cell types and increased proliferation of aging stem cells, with elevated RNA Polymerase II (RNAPII) at S-phase-dependent histone genes. Specific upregulation of histone genes during aging stem cell proliferation resembles RNAPII hypertranscription of histone genes in aggressive human cancers. Our work reveals that misregulation of the Polycomb-mediated H3K27me3 histone modification in differentiated cells during aging not only underlies tissue decline but also mirrors transcriptional changes in cancer, suggesting a common mechanism linking aging and cancer progression.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"7 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145536191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Joint imputation and deconvolution of gene expression across spatial transcriptomics platforms 跨空间转录组学平台的基因表达联合植入和反卷积

IF 7 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2025-11-17 DOI: 10.1101/gr.280555.125

Hongyu Zheng, Hirak Sarkar, Benjamin J. Raphael

Spatially resolved transcriptomics (SRT) technologies measure gene expression across thousands of spatial locations within a tissue slice. Multiple SRT technologies are currently available and others are in active development, with each technology having varying spatial resolution (subcellular, single-cell, or multicellular regions), gene coverage (targeted vs. whole-transcriptome), and sequencing depth per location. For example, the widely used 10x Genomics Visium platform measures whole transcriptomes from multiple-cell-sized spots, whereas the 10x Genomics Xenium platform measures a few hundred genes at subcellular resolution. A number of studies apply multiple SRT technologies to slices that originate from the same biological tissue. Integration of data from different SRT technologies can overcome limitations of the individual technologies, enabling the imputation of expression from unmeasured genes in targeted technologies and/or the deconvolution of admixed expression from technologies with lower spatial resolution. Here, we introduce Spatial Integration for Imputation and Deconvolution (SIID), an algorithm to reconstruct a latent spatial gene expression matrix from a pair of observations from different SRT technologies. SIID leverages a spatial alignment and uses a joint nonnegative factorization model to accurately impute missing gene expression and infer gene expression signatures of cell types from admixed SRT data. In simulations involving paired SRT data sets from different technologies (e.g., Xenium and Visium), SIID shows superior performance in reconstructing spot-to-cell-type assignments, recovering cell type–specific gene expression and imputing missing data compared to contemporary tools. When applied to real-world 10x Xenium-Visium pairs from human breast and colon cancer tissues, SIID achieves highest performance in imputing holdout gene expression.

空间分辨转录组学（SRT）技术测量组织切片内数千个空间位置的基因表达。目前有多种SRT技术可用，其他技术正在积极开发中，每种技术具有不同的空间分辨率（亚细胞，单细胞或多细胞区域），基因覆盖范围（靶向与全转录组）以及每个位置的测序深度。例如，广泛使用的10x Genomics Visium平台从多个细胞大小的点测量整个转录组，而10x Genomics Xenium平台在亚细胞分辨率上测量几百个基因。许多研究将多种SRT技术应用于来自同一生物组织的切片。整合来自不同SRT技术的数据可以克服单个技术的局限性，从而可以在目标技术中插入未测量基因的表达和/或从较低空间分辨率的技术中反卷积混合表达。在这里，我们介绍了用于Imputation和Deconvolution （SIID）的空间集成，这是一种从不同SRT技术的一对观测数据中重建潜在空间基因表达矩阵的算法。SIID利用空间比对和联合非负因子分解模型，从混合的SRT数据中准确地推算缺失的基因表达和推断细胞类型的基因表达特征。在涉及来自不同技术（例如Xenium和Visium）的配对SRT数据集的模拟中，与当代工具相比，SIID在重建点到细胞类型分配、恢复细胞类型特异性基因表达和输入缺失数据方面表现出优越的性能。当应用于来自人类乳腺癌和结肠癌组织的真实世界的10倍Xenium-Visium对时，SIID在计算holdout基因表达方面达到了最高的性能。

{"title":"Joint imputation and deconvolution of gene expression across spatial transcriptomics platforms","authors":"Hongyu Zheng, Hirak Sarkar, Benjamin J. Raphael","doi":"10.1101/gr.280555.125","DOIUrl":"https://doi.org/10.1101/gr.280555.125","url":null,"abstract":"Spatially resolved transcriptomics (SRT) technologies measure gene expression across thousands of spatial locations within a tissue slice. Multiple SRT technologies are currently available and others are in active development, with each technology having varying spatial resolution (subcellular, single-cell, or multicellular regions), gene coverage (targeted vs. whole-transcriptome), and sequencing depth per location. For example, the widely used 10x Genomics Visium platform measures whole transcriptomes from multiple-cell-sized spots, whereas the 10x Genomics Xenium platform measures a few hundred genes at subcellular resolution. A number of studies apply multiple SRT technologies to slices that originate from the same biological tissue. Integration of data from different SRT technologies can overcome limitations of the individual technologies, enabling the imputation of expression from unmeasured genes in targeted technologies and/or the deconvolution of admixed expression from technologies with lower spatial resolution. Here, we introduce Spatial Integration for Imputation and Deconvolution (SIID), an algorithm to reconstruct a latent spatial gene expression matrix from a pair of observations from different SRT technologies. SIID leverages a spatial alignment and uses a joint nonnegative factorization model to accurately impute missing gene expression and infer gene expression signatures of cell types from admixed SRT data. In simulations involving paired SRT data sets from different technologies (e.g., Xenium and Visium), SIID shows superior performance in reconstructing spot-to-cell-type assignments, recovering cell type–specific gene expression and imputing missing data compared to contemporary tools. When applied to real-world 10x Xenium-Visium pairs from human breast and colon cancer tissues, SIID achieves highest performance in imputing holdout gene expression.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"3 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145536177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

OMKar automates genome karyotyping using optical maps to identify constitutional abnormalities OMKar自动化基因组核型，使用光学图来识别体质异常

IF 7 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2025-11-14 DOI: 10.1101/gr.280536.125

Siavash Raeisi Dehkordi, Zhaoyang Jia, Joey Estabrook, Jen Hauenstein, Neil Miller, Naz Güleray-Lafci, Jürgen Neesen, Alex Hastie, Alka Chaubey, Andy Wing Chun Pang, Paul Dremsek, Vineet Bafna

The whole-genome karyotype refers to the sequence of large chromosomal segments comprising an individual's genotype. Karyotype analysis, which includes identifying aneuploidies and structural rearrangements, is essential for understanding genetic risk factors, informing diagnosis and treatment, and guiding genetic counseling in constitutional disorders. The current karyotyping standard relies on microscopic chromosome examination, a complex and expertise-dependent process with megabase-scale resolution. Optical genome mapping (OGM) technology offers an efficient approach to detect large-scale genomic lesions. Here, we introduce OMKar, a computational method that generates virtual karyotypes from OGM data. OMKar integrates structural variants (SVs) and copy number (CN) variants into a breakpoint graph representation. It re-estimates CNs using integer linear programming to enforce CN balance and then identifies constrained Eulerian paths corresponding to full chromosome structures. OMKar is evaluated on 38 whole-genome simulations of constitutional disorders, achieving 88% precision and 95% recall for SV concordance and a 95% Jaccard score for CN concordance. We further apply OMKar to 154 clinical samples including 50 prenatal, 41 postnatal, and 63 parental genomes collected across 10 sites. It correctly reconstructs the karyotype in 144 cases, including 25 of 25 aneuploidies, 32 of 32 balanced translocations, and 72 of 82 unbalanced rearrangements. Identified disorders include cri-du-chat, Wolf–Hirschhorn, Prader–Willi, Down, and Turner syndromes. Notably, OMKar uncovers plausible genetic mechanisms in five previously unexplained cases. These results demonstrate the accuracy and utility of OMKar for OGM-based constitutional karyotyping.

全基因组核型是指包含个体基因型的大染色体片段的序列。核型分析，包括识别非整倍体和结构重排，对于了解遗传风险因素，为诊断和治疗提供信息，并指导体质疾病的遗传咨询至关重要。目前的核型标准依赖于显微镜下的染色体检查，这是一个复杂的、依赖专业知识的过程，具有超大规模的分辨率。光学基因组定位（OGM）技术提供了检测大规模基因组病变的有效方法。在这里，我们介绍了OMKar，一种从OGM数据生成虚拟核型的计算方法。OMKar将结构变量（SVs）和拷贝数（CN）变量集成到一个断点图表示中。该算法利用整数线性规划对神经网络进行重新估计，实现神经网络平衡，然后识别出与完整染色体结构相对应的约束欧拉路径。OMKar在38个体质疾病的全基因组模拟中进行了评估，SV一致性达到88%的准确率和95%的召回率，CN一致性达到95%的Jaccard评分。我们进一步将OMKar应用于154个临床样本，包括来自10个地点的50个产前、41个产后和63个亲本基因组。它正确地重建了144例的核型，包括25个非整倍体中的25个，32个平衡易位中的32个，82个不平衡重排中的72个。已确定的疾病包括cri-du-chat， Wolf-Hirschhorn, Prader-Willi， Down和Turner综合征。值得注意的是，OMKar在五个先前无法解释的病例中揭示了看似合理的遗传机制。这些结果证明了OMKar在基于ogm的体质核型分析中的准确性和实用性。

{"title":"OMKar automates genome karyotyping using optical maps to identify constitutional abnormalities","authors":"Siavash Raeisi Dehkordi, Zhaoyang Jia, Joey Estabrook, Jen Hauenstein, Neil Miller, Naz Güleray-Lafci, Jürgen Neesen, Alex Hastie, Alka Chaubey, Andy Wing Chun Pang, Paul Dremsek, Vineet Bafna","doi":"10.1101/gr.280536.125","DOIUrl":"https://doi.org/10.1101/gr.280536.125","url":null,"abstract":"The whole-genome karyotype refers to the sequence of large chromosomal segments comprising an individual's genotype. Karyotype analysis, which includes identifying aneuploidies and structural rearrangements, is essential for understanding genetic risk factors, informing diagnosis and treatment, and guiding genetic counseling in constitutional disorders. The current karyotyping standard relies on microscopic chromosome examination, a complex and expertise-dependent process with megabase-scale resolution. Optical genome mapping (OGM) technology offers an efficient approach to detect large-scale genomic lesions. Here, we introduce OMKar, a computational method that generates virtual karyotypes from OGM data. OMKar integrates structural variants (SVs) and copy number (CN) variants into a breakpoint graph representation. It re-estimates CNs using integer linear programming to enforce CN balance and then identifies constrained Eulerian paths corresponding to full chromosome structures. OMKar is evaluated on 38 whole-genome simulations of constitutional disorders, achieving 88% precision and 95% recall for SV concordance and a 95% Jaccard score for CN concordance. We further apply OMKar to 154 clinical samples including 50 prenatal, 41 postnatal, and 63 parental genomes collected across 10 sites. It correctly reconstructs the karyotype in 144 cases, including 25 of 25 aneuploidies, 32 of 32 balanced translocations, and 72 of 82 unbalanced rearrangements. Identified disorders include cri-du-chat, Wolf–Hirschhorn, Prader–Willi, Down, and Turner syndromes. Notably, OMKar uncovers plausible genetic mechanisms in five previously unexplained cases. These results demonstrate the accuracy and utility of OMKar for OGM-based constitutional karyotyping.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"11 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145515801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Graph-based deep reinforcement learning for haplotype assembly with Ralphi 基于图的单倍型装配深度强化学习

IF 7 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2025-11-14 DOI: 10.1101/gr.280569.125

Enzo Battistella, Anant Maheshwari, Barış Ekim, Bonnie Berger, Victoria Popic

Haplotype assembly is the problem of reconstructing the combination of alleles on the maternally and paternally inherited chromosome copies. Individual haplotypes are essential to our understanding of how combinations of different variants impact phenotype. In this work, we focus on read-based haplotype assembly of individual diploid genomes, which reconstructs the two haplotypes directly from read alignments at variant loci. We introduce Ralphi, a novel deep reinforcement learning framework for haplotype assembly, which integrates the representational power of deep learning with reinforcement learning to accurately partition read fragments into their respective haplotype sets. To set the reward objective for reinforcement learning, our approach uses the classic reduction of the problem to the maximum fragment cut formulation on fragment graphs, in which nodes correspond to reads and edge weights capture the conflict or agreement of the reads at shared variant sites. We train Ralphi on a diverse data set of fragment graph topologies derived from genomes in the 1000 Genomes Project. We show that Ralphi achieves lower error rates at comparable or longer haplotype block lengths over the state of the art for short and long reads at varying coverage in standard human genome benchmarks.

单倍型组装是对母系和父系遗传染色体拷贝上的等位基因组合进行重建的问题。单个单倍型对于我们理解不同变异的组合如何影响表型至关重要。在这项工作中，我们将重点放在个体二倍体基因组的基于读取的单倍型组装上，该方法直接从变异位点的读取比对中重建两个单倍型。我们介绍了一种新的用于单倍型组装的深度强化学习框架Ralphi，它将深度学习的表征能力与强化学习相结合，以准确地将读片段划分为各自的单倍型集。为了设置强化学习的奖励目标，我们的方法使用经典的将问题简化为片段图上的最大片段切割公式，其中节点对应于读取，边缘权重捕获共享变体位点读取的冲突或一致。我们训练Ralphi在一个不同的数据集片段图拓扑源自1000基因组计划的基因组。我们表明，在标准人类基因组基准中，在不同覆盖范围的短读和长读中，Ralphi在可比或更长的单倍型块长度上实现了较低的错误率。

{"title":"Graph-based deep reinforcement learning for haplotype assembly with Ralphi","authors":"Enzo Battistella, Anant Maheshwari, Barış Ekim, Bonnie Berger, Victoria Popic","doi":"10.1101/gr.280569.125","DOIUrl":"https://doi.org/10.1101/gr.280569.125","url":null,"abstract":"Haplotype assembly is the problem of reconstructing the combination of alleles on the maternally and paternally inherited chromosome copies. Individual haplotypes are essential to our understanding of how combinations of different variants impact phenotype. In this work, we focus on read-based haplotype assembly of individual diploid genomes, which reconstructs the two haplotypes directly from read alignments at variant loci. We introduce Ralphi, a novel deep reinforcement learning framework for haplotype assembly, which integrates the representational power of deep learning with reinforcement learning to accurately partition read fragments into their respective haplotype sets. To set the reward objective for reinforcement learning, our approach uses the classic reduction of the problem to the maximum fragment cut formulation on fragment graphs, in which nodes correspond to reads and edge weights capture the conflict or agreement of the reads at shared variant sites. We train Ralphi on a diverse data set of fragment graph topologies derived from genomes in the 1000 Genomes Project. We show that Ralphi achieves lower error rates at comparable or longer haplotype block lengths over the state of the art for short and long reads at varying coverage in standard human genome benchmarks.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"87 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145515657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unified integration of spatial transcriptomics across platforms with LLOKI 通过LLOKI统一整合跨平台的空间转录组学

IF 7 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2025-11-13 DOI: 10.1101/gr.280803.125

Ellie Haber, Ajinkya Deshpande, Jian Ma, Spencer Krieger

Spatial transcriptomics (ST) has transformed our understanding of tissue architecture and cellular interactions, but integrating ST data across platforms remains challenging due to differences in gene panels, data sparsity, and technical variability. Here, we introduce LLOKI, a novel framework for integrating imaging-based ST data from diverse platforms without requiring shared gene panels. LLOKI addresses ST integration through two key alignment tasks: feature alignment across technologies and batch alignment across data sets. Optimal transport-guided feature propagation adjusts data sparsity to match scRNA-seq references through graph-based imputation, enabling single-cell foundation models such as scGPT to generate unified features. Batch alignment then refines scGPT-transformed embeddings, mitigating batch effects while preserving biological variability. Evaluations on mouse brain samples from five different technologies demonstrate that LLOKI outperforms existing methods and is effective for cross-technology spatial gene program identification, and tissue slice alignment. Applying LLOKI to five ovarian cancer data sets, we identify an integrated gene program indicative of tumor-infiltrating T cells across gene panels. Together, LLOKI provides a robust foundation for cross-platform ST studies, with the potential to scale to large atlas data sets, enabling deeper insights into cellular organization and tissue environments.

空间转录组学（ST）已经改变了我们对组织结构和细胞相互作用的理解，但由于基因面板、数据稀疏性和技术可变性的差异，跨平台整合ST数据仍然具有挑战性。在这里，我们介绍了LLOKI，这是一个新的框架，可以整合来自不同平台的基于成像的ST数据，而不需要共享基因面板。LLOKI通过两个关键对齐任务来解决ST集成：跨技术的特征对齐和跨数据集的批量对齐。最优传输引导的特征传播通过基于图的插入调整数据稀疏性以匹配scRNA-seq引用，使单细胞基础模型（如scGPT）能够生成统一的特征。批校准然后改进scgpt转换的嵌入，减轻批效应，同时保持生物可变性。对五种不同技术的小鼠脑样本的评估表明，LLOKI优于现有方法，在跨技术空间基因程序识别和组织切片比对方面是有效的。将LLOKI应用于五个卵巢癌数据集，我们在基因面板中确定了一个指示肿瘤浸润T细胞的综合基因程序。总之，LLOKI为跨平台的ST研究提供了坚实的基础，具有扩展到大型图谱数据集的潜力，能够更深入地了解细胞组织和组织环境。

{"title":"Unified integration of spatial transcriptomics across platforms with LLOKI","authors":"Ellie Haber, Ajinkya Deshpande, Jian Ma, Spencer Krieger","doi":"10.1101/gr.280803.125","DOIUrl":"https://doi.org/10.1101/gr.280803.125","url":null,"abstract":"Spatial transcriptomics (ST) has transformed our understanding of tissue architecture and cellular interactions, but integrating ST data across platforms remains challenging due to differences in gene panels, data sparsity, and technical variability. Here, we introduce LLOKI, a novel framework for integrating imaging-based ST data from diverse platforms without requiring shared gene panels. LLOKI addresses ST integration through two key alignment tasks: feature alignment across technologies and batch alignment across data sets. Optimal transport-guided feature propagation adjusts data sparsity to match scRNA-seq references through graph-based imputation, enabling single-cell foundation models such as scGPT to generate unified features. Batch alignment then refines scGPT-transformed embeddings, mitigating batch effects while preserving biological variability. Evaluations on mouse brain samples from five different technologies demonstrate that LLOKI outperforms existing methods and is effective for cross-technology spatial gene program identification, and tissue slice alignment. Applying LLOKI to five ovarian cancer data sets, we identify an integrated gene program indicative of tumor-infiltrating T cells across gene panels. Together, LLOKI provides a robust foundation for cross-platform ST studies, with the potential to scale to large atlas data sets, enabling deeper insights into cellular organization and tissue environments.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"55 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145509231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Modest increase in the de novo single nucleotide mutation rate in house mice born by assisted reproduction 通过辅助生殖出生的家鼠新生单核苷酸突变率的适度增加

IF 7 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2025-11-13 DOI: 10.1101/gr.281180.125

Laura Blanco-Berdugo, Alexis Garretson, Beth L Dumont

Approximately 2.6% of live births in the United States are conceived using assisted reproductive technologies (ART). While some ART, including in vitro fertilization (IVF) and intracytoplasmic sperm injection, are known to alter the epigenetic landscape of early embryonic development, their impact on DNA sequence stability is unclear. Here, we leverage the strengths of the laboratory mouse model system to investigate whether a standard ART series (ovarian hyperstimulation, gamete isolation, IVF, embryo culture, and embryo transfer) affects genome stability. Age-matched cohorts of 12 ART-derived and 16 naturally conceived C57BL/6J inbred mice were reared in a controlled setting and whole-genome sequenced to ~50× coverage. Using a rigorous pipeline for de novo single nucleotide variant (dnSNV) discovery, we observe a ~30% (95% CI: 4.5% - 56%) increase in the dnSNV rate in ART compared to naturally conceived mice (P = 0.017). Analysis of the dnSNV mutation spectrum identified signatures attributable to germline DNA repair activity but revealed no differentially enriched signatures between cohorts. We observe no enrichment of dnSNVs in specific genomic contexts, suggesting that the observed rate increase in ART-derived mice is a general genome-wide phenomenon. Together, our findings show that ART is moderately mutagenic in house mice and motivate future work to define the procedure(s) associated with this increased mutational vulnerability. While we caution that our findings cannot be immediately translated to humans, they nonetheless emphasize a pressing need for investigations on the potential mutagenicity of ART in our species.

在美国，大约2.6%的活产婴儿是通过辅助生殖技术（ART）受孕的。虽然一些ART，包括体外受精（IVF）和胞浆内单精子注射，已知会改变早期胚胎发育的表观遗传景观，但它们对DNA序列稳定性的影响尚不清楚。在这里，我们利用实验室小鼠模型系统的优势来研究标准的ART系列（卵巢过度刺激、配子分离、体外受精、胚胎培养和胚胎移植）是否影响基因组稳定性。年龄匹配的12只art衍生小鼠和16只自然受孕的C57BL/6J近交小鼠在控制环境中饲养，全基因组测序~50倍覆盖率。使用严格的新单核苷酸变异（dnSNV）发现管道，我们观察到与自然受孕小鼠相比，ART中dnSNV发生率增加了约30% (95% CI: 4.5% - 56%) （P = 0.017）。dnSNV突变谱分析确定了可归因于种系DNA修复活性的特征，但在队列之间没有发现差异富集的特征。我们在特定的基因组背景下没有观察到dnsnv的富集，这表明在art衍生小鼠中观察到的速率增加是一种普遍的全基因组现象。总之，我们的研究结果表明，ART在家鼠中具有中度致突变性，并激励未来的工作来确定与这种增加的突变易感性相关的过程。虽然我们警告说，我们的发现不能立即转化为人类，但它们仍然强调迫切需要调查ART在我们物种中的潜在诱变性。

{"title":"Modest increase in the de novo single nucleotide mutation rate in house mice born by assisted reproduction","authors":"Laura Blanco-Berdugo, Alexis Garretson, Beth L Dumont","doi":"10.1101/gr.281180.125","DOIUrl":"https://doi.org/10.1101/gr.281180.125","url":null,"abstract":"Approximately 2.6% of live births in the United States are conceived using assisted reproductive technologies (ART). While some ART, including in vitro fertilization (IVF) and intracytoplasmic sperm injection, are known to alter the epigenetic landscape of early embryonic development, their impact on DNA sequence stability is unclear. Here, we leverage the strengths of the laboratory mouse model system to investigate whether a standard ART series (ovarian hyperstimulation, gamete isolation, IVF, embryo culture, and embryo transfer) affects genome stability. Age-matched cohorts of 12 ART-derived and 16 naturally conceived C57BL/6J inbred mice were reared in a controlled setting and whole-genome sequenced to ~50× coverage. Using a rigorous pipeline for de novo single nucleotide variant (dnSNV) discovery, we observe a ~30% (95% CI: 4.5% - 56%) increase in the dnSNV rate in ART compared to naturally conceived mice (P = 0.017). Analysis of the dnSNV mutation spectrum identified signatures attributable to germline DNA repair activity but revealed no differentially enriched signatures between cohorts. We observe no enrichment of dnSNVs in specific genomic contexts, suggesting that the observed rate increase in ART-derived mice is a general genome-wide phenomenon. Together, our findings show that ART is moderately mutagenic in house mice and motivate future work to define the procedure(s) associated with this increased mutational vulnerability. While we caution that our findings cannot be immediately translated to humans, they nonetheless emphasize a pressing need for investigations on the potential mutagenicity of ART in our species.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"115 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145509232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Label-free selection of marker genes in single-cell and spatial transcriptomics with geneCover 利用geneCover进行单细胞标记基因的无标记选择和空间转录组学研究

IF 7 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2025-11-12 DOI: 10.1101/gr.280539.125

An Wang, Stephanie Hicks, Donald Geman, Laurent Younes

The selection of marker gene panels is critical for capturing the cellular and spatial heterogeneity in the expanding atlases of single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics data. Most current approaches to marker gene selection operate in a label-based framework, which is inherently limited by its dependency on predefined cell type labels or clustering results. In contrast, existing label-free methods often struggle to identify genes that characterize rare cell types or subtle spatial patterns, and they frequently fail to scale efficiently with large data sets. Here, we introduce geneCover, a label-free combinatorial method that selects an optimal panel of minimally redundant marker genes based on gene-gene correlations. Our method demonstrates excellent scalability to large data sets and identifies marker gene panels that capture distinct correlation structures across the transcriptome. This allows geneCover to distinguish cell states in various tissues of living organisms effectively, including those associated with rare or otherwise difficult-to-identify cell types. We evaluate the performance of geneCover across various scRNA-seq and spatial transcriptomics data sets, comparing it to other label-free algorithms to highlight its utility and potential in diverse biological contexts.

在单细胞RNA测序（scRNA-seq）和空间转录组学数据的扩展图谱中，标记基因面板的选择对于捕获细胞和空间异质性至关重要。目前大多数标记基因选择的方法都是在基于标记的框架中操作的，这种框架固有地受到其依赖于预定义的细胞类型标记或聚类结果的限制。相比之下，现有的无标签方法往往难以识别具有罕见细胞类型或微妙空间模式特征的基因，而且它们经常无法有效地扩展大型数据集。在这里，我们介绍了geneCover，这是一种基于基因相关性选择最小冗余标记基因的最佳组合方法。我们的方法证明了对大型数据集的出色可扩展性，并确定了捕获转录组中不同相关结构的标记基因面板。这使得geneCover能够有效地区分生物体各种组织中的细胞状态，包括那些与罕见或难以识别的细胞类型相关的细胞状态。我们评估了geneCover在各种scRNA-seq和空间转录组学数据集上的性能，并将其与其他无标记算法进行比较，以突出其在不同生物学背景下的实用性和潜力。

{"title":"Label-free selection of marker genes in single-cell and spatial transcriptomics with geneCover","authors":"An Wang, Stephanie Hicks, Donald Geman, Laurent Younes","doi":"10.1101/gr.280539.125","DOIUrl":"https://doi.org/10.1101/gr.280539.125","url":null,"abstract":"The selection of marker gene panels is critical for capturing the cellular and spatial heterogeneity in the expanding atlases of single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics data. Most current approaches to marker gene selection operate in a label-based framework, which is inherently limited by its dependency on predefined cell type labels or clustering results. In contrast, existing label-free methods often struggle to identify genes that characterize rare cell types or subtle spatial patterns, and they frequently fail to scale efficiently with large data sets. Here, we introduce geneCover, a label-free combinatorial method that selects an optimal panel of minimally redundant marker genes based on gene-gene correlations. Our method demonstrates excellent scalability to large data sets and identifies marker gene panels that capture distinct correlation structures across the transcriptome. This allows geneCover to distinguish cell states in various tissues of living organisms effectively, including those associated with rare or otherwise difficult-to-identify cell types. We evaluate the performance of geneCover across various scRNA-seq and spatial transcriptomics data sets, comparing it to other label-free algorithms to highlight its utility and potential in diverse biological contexts.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"171 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145492621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Secure phasing of private genomes in a trusted execution environment with TX-Phase 使用TX-Phase在可信的执行环境中安全分阶段私有基因组

IF 7 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2025-11-12 DOI: 10.1101/gr.280558.125

Natnatee Dokmai, Kaiyuan Zhu, S. Cenk Sahinalp, Hyunghoon Cho

Genotype imputation servers enable researchers with limited resources to extract valuable insights from their data with enhanced accuracy and ease. However, the utility of these services is limited for those with sensitive study cohorts or those in restrictive regulatory environments owing to data privacy concerns. Although privacy-preserving analysis tools have been developed to broaden access to these servers, none of the existing methods support haplotype phasing, a critical component of the imputation workflow. The complexity of phasing algorithms poses a significant challenge in maintaining practical performance under privacy constraints. Here, we introduce TX-Phase, a secure haplotype phasing method based on the framework of trusted execution environments (TEEs). TX-Phase allows users’ private genomic data to be phased while ensuring data confidentiality and integrity of the computation. We introduce novel data-oblivious algorithmic techniques based on compressed reference panels and dynamic fixed-point arithmetic that comprehensively mitigate side-channel leakages in TEEs to provide robust protection of users’ genomic data throughout the analysis. Our experiments on a range of data sets from the UK Biobank and Haplotype Reference Consortium demonstrate the state-of-the-art phasing accuracy and practical runtimes of TX-Phase. Our work enables secure phasing of private genomes, opening access to large reference genomic data sets for a broader scientific community.

基因型插入服务器使资源有限的研究人员能够以更高的准确性和便利性从他们的数据中提取有价值的见解。然而，由于数据隐私问题，这些服务的效用对于那些具有敏感研究群体或处于限制性监管环境中的人来说是有限的。尽管隐私保护分析工具已经被开发出来，以扩大对这些服务器的访问，但现有的方法都不支持单倍型相位，这是插入工作流程的关键组成部分。相位算法的复杂性对在隐私约束下保持实际性能提出了重大挑战。在这里，我们介绍了一种基于可信执行环境（TEEs）框架的安全单倍型相位方法TX-Phase。TX-Phase允许用户的私人基因组数据分阶段进行，同时确保数据的机密性和计算的完整性。我们引入了基于压缩参考面板和动态定点算法的新型数据无关算法技术，全面缓解tee中的侧信道泄漏，从而在整个分析过程中为用户的基因组数据提供强大的保护。我们对来自UK Biobank和Haplotype Reference Consortium的一系列数据集进行的实验证明了TX-Phase的最先进的分相准确性和实际运行时间。我们的工作使私人基因组的安全分阶段，开放访问大型参考基因组数据集为更广泛的科学界。

{"title":"Secure phasing of private genomes in a trusted execution environment with TX-Phase","authors":"Natnatee Dokmai, Kaiyuan Zhu, S. Cenk Sahinalp, Hyunghoon Cho","doi":"10.1101/gr.280558.125","DOIUrl":"https://doi.org/10.1101/gr.280558.125","url":null,"abstract":"Genotype imputation servers enable researchers with limited resources to extract valuable insights from their data with enhanced accuracy and ease. However, the utility of these services is limited for those with sensitive study cohorts or those in restrictive regulatory environments owing to data privacy concerns. Although privacy-preserving analysis tools have been developed to broaden access to these servers, none of the existing methods support haplotype phasing, a critical component of the imputation workflow. The complexity of phasing algorithms poses a significant challenge in maintaining practical performance under privacy constraints. Here, we introduce TX-Phase, a secure haplotype phasing method based on the framework of trusted execution environments (TEEs). TX-Phase allows users’ private genomic data to be phased while ensuring data confidentiality and integrity of the computation. We introduce novel data-oblivious algorithmic techniques based on compressed reference panels and dynamic fixed-point arithmetic that comprehensively mitigate side-channel leakages in TEEs to provide robust protection of users’ genomic data throughout the analysis. Our experiments on a range of data sets from the UK Biobank and Haplotype Reference Consortium demonstrate the state-of-the-art phasing accuracy and practical runtimes of TX-Phase. Our work enables secure phasing of private genomes, opening access to large reference genomic data sets for a broader scientific community.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"368 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145492620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0