首页 > 最新文献

Genome research最新文献

英文 中文
Genetic complexity of killer-cell immunoglobulin-like receptor genes in human pangenome assemblies 人类泛基因组中杀伤细胞免疫球蛋白样受体基因的遗传复杂性
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-09-09 DOI: 10.1101/gr.278358.123
Tsung-Kai Hung, Wan-Chi Liu, Sheng-Kai Lai, Hui-Wen Chuang, Yi-Che Lee, Hong-Ye Lin, Chia-Lang Hsu, Chien-Yu Chen, Ya-Chien Yang, Jacob Shujui Hsu, Pei-Lung Chen
The killer-cell immunoglobulin-like receptor (KIR) gene complex, a highly polymorphic region of the human genome that encodes proteins involved in immune responses, poses strong challenges in genotyping owing to its remarkable genetic diversity and structural intricacy. Accurate analysis of KIR alleles, including their structural variations, is crucial for understanding their roles in various immune responses. Leveraging the high-quality genome assemblies from the Human Pangenome Reference Consortium (HPRC), we present a novel bioinformatic tool, the structural KIR annoTator (SKIRT), to investigate gene diversity and facilitate precise KIR allele analysis. In 47 HPRC-phased assemblies, SKIRT identifies a recurrent novel KIR2DS4/3DL1 fusion gene in the paternal haplotype of HG02630 and maternal haplotype of NA19240. Additionally, SKIRT accurately identifies eight structural variants and 15 novel nonsynonymous alleles, all of which are independently validated using short-read data or quantitative polymerase chain reaction. Our study has discovered a total of 570 novel alleles, among which eight haplotypes harbor at least one KIR gene duplication, six haplotypes have lost at least one framework gene, and 75 out of 94 haplotypes (79.8%) carry at least five novel alleles, thus confirming KIR genetic diversity. These findings are pivotal in providing insights into KIR gene diversity and serve as a solid foundation for understanding the functional consequences of KIR structural variations. High-resolution genome assemblies offer unprecedented opportunities to explore polymorphic regions that are challenging to investigate using short-read sequencing methods. The SKIRT pipeline emerges as a highly efficient tool, enabling the comprehensive detection of the complete spectrum of KIR alleles within human genome assemblies.
杀伤细胞免疫球蛋白样受体(KIR)基因复合物是人类基因组中一个编码参与免疫反应的蛋白质的高度多态区,由于其显著的遗传多样性和结构的复杂性,给基因分型带来了巨大挑战。准确分析 KIR 等位基因,包括其结构变异,对于了解它们在各种免疫反应中的作用至关重要。利用人类泛基因组参考联盟(Human Pangenome Reference Consortium,HPRC)的高质量基因组组装,我们提出了一种新型生物信息学工具--KIR结构注释器(structural KIR annoTator,SKIRT),用于研究基因多样性并促进精确的KIR等位基因分析。在 47 个 HPRC 分期组合中,SKIRT 在父系单倍型 HG02630 和母系单倍型 NA19240 中发现了一个反复出现的新型 KIR2DS4/3DL1 融合基因。此外,SKIRT 还准确鉴定出了 8 个结构变异和 15 个新型非同义等位基因,所有这些变异和等位基因都通过短读数据或定量聚合酶链反应进行了独立验证。我们的研究共发现了 570 个新型等位基因,其中有 8 个单倍型携带至少一个 KIR 基因重复,6 个单倍型丢失了至少一个框架基因,94 个单倍型中有 75 个(79.8%)携带至少 5 个新型等位基因,从而证实了 KIR 遗传多样性。这些发现对于深入了解 KIR 基因多样性至关重要,也为了解 KIR 结构变异的功能性后果奠定了坚实的基础。高分辨率基因组组装为探索多态性区域提供了前所未有的机会,而使用短线程测序方法对这些区域进行研究具有挑战性。SKIRT 管道是一种高效的工具,能够全面检测人类基因组组装中的全部 KIR 等位基因。
{"title":"Genetic complexity of killer-cell immunoglobulin-like receptor genes in human pangenome assemblies","authors":"Tsung-Kai Hung, Wan-Chi Liu, Sheng-Kai Lai, Hui-Wen Chuang, Yi-Che Lee, Hong-Ye Lin, Chia-Lang Hsu, Chien-Yu Chen, Ya-Chien Yang, Jacob Shujui Hsu, Pei-Lung Chen","doi":"10.1101/gr.278358.123","DOIUrl":"https://doi.org/10.1101/gr.278358.123","url":null,"abstract":"The killer-cell immunoglobulin-like receptor (KIR) gene complex, a highly polymorphic region of the human genome that encodes proteins involved in immune responses, poses strong challenges in genotyping owing to its remarkable genetic diversity and structural intricacy. Accurate analysis of KIR alleles, including their structural variations, is crucial for understanding their roles in various immune responses. Leveraging the high-quality genome assemblies from the Human Pangenome Reference Consortium (HPRC), we present a novel bioinformatic tool, the structural KIR annoTator (SKIRT), to investigate gene diversity and facilitate precise KIR allele analysis. In 47 HPRC-phased assemblies, SKIRT identifies a recurrent novel <em>KIR2DS4/3DL1</em> fusion gene in the paternal haplotype of HG02630 and maternal haplotype of NA19240. Additionally, SKIRT accurately identifies eight structural variants and 15 novel nonsynonymous alleles, all of which are independently validated using short-read data or quantitative polymerase chain reaction. Our study has discovered a total of 570 novel alleles, among which eight haplotypes harbor at least one KIR gene duplication, six haplotypes have lost at least one framework gene, and 75 out of 94 haplotypes (79.8%) carry at least five novel alleles, thus confirming KIR genetic diversity. These findings are pivotal in providing insights into KIR gene diversity and serve as a solid foundation for understanding the functional consequences of KIR structural variations. High-resolution genome assemblies offer unprecedented opportunities to explore polymorphic regions that are challenging to investigate using short-read sequencing methods. The SKIRT pipeline emerges as a highly efficient tool, enabling the comprehensive detection of the complete spectrum of KIR alleles within human genome assemblies.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"68 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142160428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Privacy-preserving biological age prediction over federated human methylation data using fully homomorphic encryption 利用全同态加密在联合人类甲基化数据上进行保护隐私的生物年龄预测
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-09-05 DOI: 10.1101/gr.279071.124
Meir Goldenberg, Loay Mualem, Amit Shahar, Sagi Snir, Adi Akavia
DNA methylation data plays a crucial role in estimating chronological age in mammals, offering real-time insights into an individual’s aging process. The Epigenetic Pacemaker (EPM) model allows inference of the biological age as deviations from the population trend. Given the sensitivity of this data, it is essential to safeguard both inputs and outputs of the EPM model. In a recent study, a privacy-preserving approach for EPM computation was introduced, utilizing Fully Homomorphic Encryption (FHE). However, their method had limitations, including having high communication complexity and being impractical for large datasets Our work presents a new privacy preserving protocol for EPM computation, analytically improving both privacy and complexity. Notably, we employ a single server for the secure computation phase while ensuring privacy even in the event of server corruption (compared to requiring two non-colluding servers. Using techniques from symbolic algebra and number theory, the new protocol eliminates the need for communication during secure computation, significantly improves asymptotic runtime and and offers better compatibility to parallel computing for further time complexity reduction. We have implemented our protocol, demonstrating its ability to produce results similar to the standard (insecure) EPM model with substantial performance improvement compared to previous methods. These findings hold promise for enhancing data security in medical applications where personal privacy is paramount. The generality of both the new approach and the EPM, suggests that this protocol may be useful to other uses employing similar expectation maximization techniques.
DNA 甲基化数据在估算哺乳动物的纪年年龄方面起着至关重要的作用,可实时了解个体的衰老过程。表观遗传起搏器(EPM)模型可以推断出偏离群体趋势的生物年龄。鉴于这些数据的敏感性,必须保护 EPM 模型的输入和输出。在最近的一项研究中,介绍了一种利用完全同态加密(FHE)进行 EPM 计算的隐私保护方法。我们的研究提出了一种新的 EPM 计算隐私保护协议,从分析上改善了隐私保护和复杂性。值得注意的是,我们在安全计算阶段只使用了一个服务器,同时即使在服务器损坏的情况下也能确保隐私(相比之下,需要两个非共用服务器)。新协议使用符号代数和数论技术,消除了安全计算期间的通信需求,显著提高了渐近运行时间,并更好地兼容并行计算,进一步降低了时间复杂性。我们已经实施了我们的协议,证明它能够产生与标准(不安全)EPM 模型类似的结果,与以前的方法相比,性能有了大幅提高。这些发现为提高个人隐私至关重要的医疗应用中的数据安全性带来了希望。新方法和 EPM 的通用性表明,该协议可能适用于采用类似期望最大化技术的其他用途。
{"title":"Privacy-preserving biological age prediction over federated human methylation data using fully homomorphic encryption","authors":"Meir Goldenberg, Loay Mualem, Amit Shahar, Sagi Snir, Adi Akavia","doi":"10.1101/gr.279071.124","DOIUrl":"https://doi.org/10.1101/gr.279071.124","url":null,"abstract":"DNA methylation data plays a crucial role in estimating chronological age in mammals, offering real-time insights into an individual’s aging process. The Epigenetic Pacemaker (EPM) model allows inference of the biological age as deviations from the population trend. Given the sensitivity of this data, it is essential to safeguard both inputs and outputs of the EPM model. In a recent study, a privacy-preserving approach for EPM computation was introduced, utilizing Fully Homomorphic Encryption (FHE). However, their method had limitations, including having high communication complexity and being impractical for large datasets Our work presents a new privacy preserving protocol for EPM computation, analytically improving both privacy and complexity. Notably, we employ a single server for the secure computation phase while ensuring privacy even in the event of server corruption (compared to requiring two non-colluding servers. Using techniques from symbolic algebra and number theory, the new protocol eliminates the need for communication during secure computation, significantly improves asymptotic runtime and and offers better compatibility to parallel computing for further time complexity reduction. We have implemented our protocol, demonstrating its ability to produce results similar to the standard (insecure) EPM model with substantial performance improvement compared to previous methods. These findings hold promise for enhancing data security in medical applications where personal privacy is paramount. The generality of both the new approach and the EPM, suggests that this protocol may be useful to other uses employing similar expectation maximization techniques.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"7 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142138166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Bayesian framework for inferring dynamic intercellular interactions from time-series single-cell data 从时间序列单细胞数据推断细胞间动态相互作用的贝叶斯框架
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-09-05 DOI: 10.1101/gr.279126.124
Cameron Y Park, Shouvik Mani, Nicolas Beltran-Velez, Katie Maurer, Teddy Huang, Shuqiang Li, Satyen Gohil, Kenneth J Livak, David A Knowles, Catherine J Wu, Elham Azizi
Characterizing cell-cell communication and tracking its variability over time are crucial for understanding the coordination of biological processes mediating normal development, disease progression, and responses to perturbations such as therapies. Existing tools fail to capture time-dependent intercellular interactions, and primarily rely on existing databases compiled from limited contexts. We introduce DIISCO, a Bayesian framework designed to characterize the temporal dynamics of cellular interactions using single-cell RNA sequencing data from multiple time points. Our method utilizes structured Gaussian process regression to unveil time-resolved interactions among diverse cell types according to their coevolution and incorporates prior knowledge of receptor-ligand complexes. We show the interpretability of DIISCO in simulated data and new data collected from T cells co-cultured with lymphoma cells, demonstrating its potential to uncover dynamic cell-cell crosstalk.
表征细胞-细胞通讯并跟踪其随时间的变化,对于了解介导正常发育、疾病进展和对疗法等干扰的反应的生物过程的协调至关重要。现有的工具无法捕捉随时间变化的细胞间相互作用,而且主要依赖于从有限的环境中汇编的现有数据库。我们介绍了 DIISCO,这是一个贝叶斯框架,旨在利用多个时间点的单细胞 RNA 测序数据描述细胞间相互作用的时间动态。我们的方法利用结构化高斯过程回归,根据不同细胞类型的共同进化揭示它们之间时间分辨的相互作用,并结合受体配体复合物的先验知识。我们在模拟数据和从与淋巴瘤细胞共培养的 T 细胞收集的新数据中展示了 DIISCO 的可解释性,证明了它揭示动态细胞间串扰的潜力。
{"title":"A Bayesian framework for inferring dynamic intercellular interactions from time-series single-cell data","authors":"Cameron Y Park, Shouvik Mani, Nicolas Beltran-Velez, Katie Maurer, Teddy Huang, Shuqiang Li, Satyen Gohil, Kenneth J Livak, David A Knowles, Catherine J Wu, Elham Azizi","doi":"10.1101/gr.279126.124","DOIUrl":"https://doi.org/10.1101/gr.279126.124","url":null,"abstract":"Characterizing cell-cell communication and tracking its variability over time are crucial for understanding the coordination of biological processes mediating normal development, disease progression, and responses to perturbations such as therapies. Existing tools fail to capture time-dependent intercellular interactions, and primarily rely on existing databases compiled from limited contexts. We introduce DIISCO, a Bayesian framework designed to characterize the temporal dynamics of cellular interactions using single-cell RNA sequencing data from multiple time points. Our method utilizes structured Gaussian process regression to unveil time-resolved interactions among diverse cell types according to their coevolution and incorporates prior knowledge of receptor-ligand complexes. We show the interpretability of DIISCO in simulated data and new data collected from T cells co-cultured with lymphoma cells, demonstrating its potential to uncover dynamic cell-cell crosstalk.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"8 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142138165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Protein domain embeddings for fast and accurate similarity search 用于快速准确相似性搜索的蛋白质结构域嵌入
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-09-05 DOI: 10.1101/gr.279127.124
Benjamin Giovanni Iovino, Haixu Tang, Yuzhen Ye
Recently developed protein language models have enabled a variety of applications with the protein contextual embeddings they produce. Per-protein representations (each protein is represented as a vector of fixed dimension) can be derived via averaging the embeddings of individual residues, or applying matrix transformation techniques such as the discrete cosine transformation to matrices of residue embeddings. Such protein-level embeddings have been applied to enable fast searches of similar proteins, however limitations have been found; for example, PROST is good at detecting global homologs but not local homologs, and knnProtT5 excels for proteins of single domains but not multi-domain proteins. Here we propose a novel approach that first segments proteins into domains (or subdomains) and then applies the discrete cosine transformation to the vectorized embeddings of residues in each domain to infer domain-level contextual vectors. Our approach, called DCTdomain, utilizes predicted contact maps from ESM-2 for domain segmentation, which is formulated as a domain segmentation problem and can be solved using a recursive cut algorithm (RecCut in short) in quadratic time to the protein length; for comparison, an existing approach for domain segmentation uses a cubic-time algorithm. We showed such domain-level contextual vectors (termed as DCT fingerprints) enable fast and accurate detection of similarity between proteins that share global similarities but with undefined extended regions between shared domains, and those that only share local similarities. In addition, tests on a database search benchmark showed that DCTdomain was able to detect distant homologs by leveraging the structural information in the contextual embeddings.
最近开发的蛋白质语言模型利用其产生的蛋白质上下文嵌入实现了多种应用。通过平均单个残基的嵌入,或对残基嵌入矩阵应用矩阵变换技术(如离散余弦变换),可以得到每个蛋白质的表示(每个蛋白质表示为一个固定维度的向量)。这种蛋白质级嵌入已被用于快速搜索相似蛋白质,但也发现了一些局限性;例如,PROST 擅长检测全局同源物,但不擅长检测局部同源物;knnProtT5 擅长检测单结构域蛋白质,但不擅长检测多结构域蛋白质。在这里,我们提出了一种新方法,首先将蛋白质分割成域(或子域),然后将离散余弦变换应用于每个域中残基的矢量化嵌入,从而推断出域级上下文向量。我们的方法被称为 DCTdomain,它利用来自 ESM-2 的预测接触图进行结构域分割,这被表述为一个结构域分割问题,使用递归切割算法(简称 RecCut)可以在蛋白质长度的二次方时间内解决;相比之下,现有的结构域分割方法使用的是三次方时间算法。我们的研究表明,这种结构域级上下文向量(称为 DCT 指纹)能够快速准确地检测出具有全局相似性但共享结构域之间存在未定义扩展区域的蛋白质与仅具有局部相似性的蛋白质之间的相似性。此外,对数据库搜索基准的测试表明,DCTdomain 能够利用上下文嵌入中的结构信息来检测远处的同源物。
{"title":"Protein domain embeddings for fast and accurate similarity search","authors":"Benjamin Giovanni Iovino, Haixu Tang, Yuzhen Ye","doi":"10.1101/gr.279127.124","DOIUrl":"https://doi.org/10.1101/gr.279127.124","url":null,"abstract":"Recently developed protein language models have enabled a variety of applications with the protein contextual embeddings they produce. Per-protein representations (each protein is represented as a vector of fixed dimension) can be derived via averaging the embeddings of individual residues, or applying matrix transformation techniques such as the discrete cosine transformation to matrices of residue embeddings. Such protein-level embeddings have been applied to enable fast searches of similar proteins, however limitations have been found; for example, PROST is good at detecting global homologs but not local homologs, and knnProtT5 excels for proteins of single domains but not multi-domain proteins. Here we propose a novel approach that first segments proteins into domains (or subdomains) and then applies the discrete cosine transformation to the vectorized embeddings of residues in each domain to infer domain-level contextual vectors. Our approach, called DCTdomain, utilizes predicted contact maps from ESM-2 for domain segmentation, which is formulated as a domain segmentation problem and can be solved using a recursive cut algorithm (RecCut in short) in quadratic time to the protein length; for comparison, an existing approach for domain segmentation uses a cubic-time algorithm. We showed such domain-level contextual vectors (termed as DCT fingerprints) enable fast and accurate detection of similarity between proteins that share global similarities but with undefined extended regions between shared domains, and those that only share local similarities. In addition, tests on a database search benchmark showed that DCTdomain was able to detect distant homologs by leveraging the structural information in the contextual embeddings.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"4 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142138164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatial Cellular Networks from omics data with SpaCeNet 利用 SpaCeNet 从 omics 数据构建空间蜂窝网络
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-09-04 DOI: 10.1101/gr.279125.124
Stefan Schrod, Niklas Lück, Robert Lohmayer, Stefan Solbrig, Dennis Völkl, Tina Wipfler, Katherine H. Shutta, Marouen Ben Guebila, Andreas Schäfer, Tim Beißbarth, Helena U. Zacharias, Peter Oefner, John Quackenbush, Michael Altenbuchinger
Advances in omics technologies have allowed spatially resolved molecular profiling of single cells, providing a window not only into the diversity and distribution of cell types within a tissue but also into the effects of interactions between cells in shaping the transcriptional landscape. Cells send chemical and mechanical signals which are received by other cells, where they can subsequently initiate context-specific gene regulatory responses. These interactions and their responses shape the individual molecular phenotype of a cell in a given microenvironment. RNAs or proteins measured in individual cells, together with the cells' spatial distribution, provide invaluable information about these mechanisms and the regulation of genes beyond processes occurring independently in each individual cell. SpaCeNet is a method designed to elucidate both the intracellular molecular networks (how molecular variables affect each other within the cell) and the intercellular molecular networks (how cells affect molecular variables in their neighbors). This is achieved by estimating conditional independence relations between captured variables within individual cells and by disentangling these from conditional independence relations between variables of different cells.
全息技术的进步使得对单细胞进行空间分辨分子剖析成为可能,这不仅为了解组织内细胞类型的多样性和分布情况提供了一个窗口,也为了解细胞之间的相互作用对转录格局的影响提供了一个窗口。细胞发出的化学和机械信号会被其他细胞接收,随后启动特定的基因调控反应。这些相互作用及其反应形成了特定微环境中细胞的分子表型。在单个细胞中测量到的 RNA 或蛋白质以及细胞的空间分布,可提供有关这些机制和基因调控的宝贵信息,这些信息超出了在每个细胞中独立发生的过程。SpaCeNet 是一种旨在阐明细胞内分子网络(分子变量如何在细胞内相互影响)和细胞间分子网络(细胞如何影响邻近细胞的分子变量)的方法。这是通过估计单个细胞内捕获的变量之间的条件独立性关系,并将其与不同细胞变量之间的条件独立性关系相分离来实现的。
{"title":"Spatial Cellular Networks from omics data with SpaCeNet","authors":"Stefan Schrod, Niklas Lück, Robert Lohmayer, Stefan Solbrig, Dennis Völkl, Tina Wipfler, Katherine H. Shutta, Marouen Ben Guebila, Andreas Schäfer, Tim Beißbarth, Helena U. Zacharias, Peter Oefner, John Quackenbush, Michael Altenbuchinger","doi":"10.1101/gr.279125.124","DOIUrl":"https://doi.org/10.1101/gr.279125.124","url":null,"abstract":"Advances in omics technologies have allowed spatially resolved molecular profiling of single cells, providing a window not only into the diversity and distribution of cell types within a tissue but also into the effects of interactions between cells in shaping the transcriptional landscape. Cells send chemical and mechanical signals which are received by other cells, where they can subsequently initiate context-specific gene regulatory responses. These interactions and their responses shape the individual molecular phenotype of a cell in a given microenvironment. RNAs or proteins measured in individual cells, together with the cells' spatial distribution, provide invaluable information about these mechanisms and the regulation of genes beyond processes occurring independently in each individual cell. SpaCeNet is a method designed to elucidate both the intracellular molecular networks (how molecular variables affect each other within the cell) and the intercellular molecular networks (how cells affect molecular variables in their neighbors). This is achieved by estimating conditional independence relations between captured variables within individual cells and by disentangling these from conditional independence relations between variables of different cells.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"8 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142130747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Matrix sketching framework for linear mixed models in association studies 关联研究中线性混合模型的矩阵草图框架
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-09-04 DOI: 10.1101/gr.279230.124
Myson C Burch, Aritra Bose, Gregory Dexter, Laxmi Parida, Petros Drineas
Linear mixed models (LMMs) have been widely used in genome-wide association studies (GWAS) to control for population stratification and cryptic relatedness. However, estimating LMM parameters is computationally expensive, necessitating large-scale matrix operations to build the genetic relatedness matrix (GRM). Over the past 25 years, Randomized Linear Algebra has provided alternative approaches to such matrix operations by leveraging matrix sketching, which often results in provably accurate fast and efficient approximations. We leverage matrix sketching to develop a fast and efficient LMM method called Matrix-Sketching LMM (MaSk-LMM) by sketching the genotype matrix to reduce its dimensions and speed up computations. Our framework comes with both theoretical guarantees and a strong empirical performance compared to current state-of-the-art for simulated traits and complex diseases.
线性混合模型(LMM)已被广泛用于全基因组关联研究(GWAS),以控制种群分层和隐性亲缘关系。然而,估计 LMM 参数的计算成本很高,需要进行大规模的矩阵运算才能建立遗传相关性矩阵(GRM)。在过去的 25 年中,随机线性代数利用矩阵草图为此类矩阵运算提供了替代方法,而矩阵草图通常可以得到准确、快速、高效的近似值。我们利用矩阵素描开发了一种快速高效的 LMM 方法,称为矩阵素描 LMM(MaSk-LMM),通过素描基因型矩阵来减少其维度并加快计算速度。在模拟性状和复杂疾病方面,与目前最先进的方法相比,我们的框架既有理论保证,又有强大的经验性能。
{"title":"Matrix sketching framework for linear mixed models in association studies","authors":"Myson C Burch, Aritra Bose, Gregory Dexter, Laxmi Parida, Petros Drineas","doi":"10.1101/gr.279230.124","DOIUrl":"https://doi.org/10.1101/gr.279230.124","url":null,"abstract":"Linear mixed models (LMMs) have been widely used in genome-wide association studies (GWAS) to control for population stratification and cryptic relatedness. However, estimating LMM parameters is computationally expensive, necessitating large-scale matrix operations to build the genetic relatedness matrix (GRM). Over the past 25 years, Randomized Linear Algebra has provided alternative approaches to such matrix operations by leveraging matrix sketching, which often results in provably accurate fast and efficient approximations. We leverage matrix sketching to develop a fast and efficient LMM method called Matrix-Sketching LMM (MaSk-LMM) by sketching the genotype matrix to reduce its dimensions and speed up computations. Our framework comes with both theoretical guarantees and a strong empirical performance compared to current state-of-the-art for simulated traits and complex diseases.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"25 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142130743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A best-match approach for gene set analysis in embedding spaces 嵌入空间中基因组分析的最佳匹配方法
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-09-04 DOI: 10.1101/gr.279141.124
Lechuan Li, Ruth Dannenfelser, Charlie Cruz, Vicky Yao
Embedding methods have emerged as a valuable class of approaches for distilling essential information from complex high-dimensional data into more accessible lower-dimensional spaces. Applications of embedding methods to biological data have demonstrated that gene embeddings can effectively capture physical, structural, and functional relationships between genes. However, this utility has been primarily realized by using gene embeddings for downstream machine learning tasks. Much less has been done to examine the embeddings directly, especially analyses of gene sets in embedding spaces. Here, we propose ANDES, a novel best-match approach that can be used with existing gene embeddings to compare gene sets while reconciling gene set diversity. This intuitive method has important downstream implications for improving the utility of embedding spaces for various tasks. Specifically, we show how ANDES, when applied to different gene embeddings encoding protein-protein interactions, can be used as a novel overrepresentation-based and rank-based gene set enrichment analysis method that achieves state-of-the-art performance. Additionally, ANDES can use multi-organism joint gene embeddings to facilitate functional knowledge transfer across organisms, allowing for phenotype mapping across model systems. Our flexible, straightforward best-match methodology can be extended to other embedding spaces with diverse community structures between set elements.
嵌入方法是一类非常有价值的方法,可将复杂的高维数据中的基本信息提炼到更容易获取的低维空间中。嵌入方法在生物数据中的应用表明,基因嵌入能有效捕捉基因之间的物理、结构和功能关系。然而,这种效用主要是通过将基因嵌入用于下游机器学习任务来实现的。直接研究嵌入,特别是分析嵌入空间中的基因集的工作则少得多。在这里,我们提出了一种新颖的最佳匹配方法--ANDES,它可以与现有的基因嵌入一起使用,在比较基因集的同时协调基因集的多样性。这种直观的方法对于提高嵌入空间在各种任务中的实用性具有重要的下游意义。具体来说,我们展示了当 ANDES 应用于编码蛋白质-蛋白质相互作用的不同基因嵌入时,如何将其用作一种新型的基于过度代表性和基于等级的基因组富集分析方法,从而达到最先进的性能。此外,ANDES 还能利用多生物体联合基因嵌入促进跨生物体的功能知识转移,从而实现跨模型系统的表型映射。我们灵活、直接的最佳匹配方法可扩展到集合元素之间具有不同群落结构的其他嵌入空间。
{"title":"A best-match approach for gene set analysis in embedding spaces","authors":"Lechuan Li, Ruth Dannenfelser, Charlie Cruz, Vicky Yao","doi":"10.1101/gr.279141.124","DOIUrl":"https://doi.org/10.1101/gr.279141.124","url":null,"abstract":"Embedding methods have emerged as a valuable class of approaches for distilling essential information from complex high-dimensional data into more accessible lower-dimensional spaces. Applications of embedding methods to biological data have demonstrated that gene embeddings can effectively capture physical, structural, and functional relationships between genes. However, this utility has been primarily realized by using gene embeddings for downstream machine learning tasks. Much less has been done to examine the embeddings directly, especially analyses of gene sets in embedding spaces. Here, we propose ANDES, a novel best-match approach that can be used with existing gene embeddings to compare gene sets while reconciling gene set diversity. This intuitive method has important downstream implications for improving the utility of embedding spaces for various tasks. Specifically, we show how ANDES, when applied to different gene embeddings encoding protein-protein interactions, can be used as a novel overrepresentation-based and rank-based gene set enrichment analysis method that achieves state-of-the-art performance. Additionally, ANDES can use multi-organism joint gene embeddings to facilitate functional knowledge transfer across organisms, allowing for phenotype mapping across model systems. Our flexible, straightforward best-match methodology can be extended to other embedding spaces with diverse community structures between set elements.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"23 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142130744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genome–wide patterns of selection-drift variation strongly associate with organismal traits across the green plant lineage 全基因组范围的选择漂移变异模式与整个绿色植物系的生物特征密切相关
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-08-29 DOI: 10.1101/gr.279002.124
Kavitha Uthanumallian, Andrea Del Cortona, Susana Coelho, Olivier De Clerck, Sebastian Duchene, Heroen Verbruggen
There are many gaps in our knowledge of how life cycle variation and organismal body architecture associate with molecular evolution. Using the diverse range of green algal body architectures and life cycle types as a test case, we hypothesize that increases in cytomorphological complexity are likely to be associated with a decrease in the effective population size, since larger-bodied organisms typically have smaller populations, resulting in increased drift. For life cycles, we expect haploid-dominant lineages to evolve under stronger selection intensity relative to diploid-dominant life cycles due to masking of deleterious alleles in heterozygotes. We use a genome-scale dataset spanning the phylogenetic diversity of green algae and phylogenetic comparative approaches to measure the relative selection intensity across different trait categories. We show stronger signatures of drift in lineages with more complex body architectures compared to unicellular lineages, which we consider to be a consequence of smaller effective population sizes of the more complex algae. Significantly higher rates of synonymous as well as nonsynonymous substitutions relative to other algal body architectures highlight that siphonous and siphonocladous body architectures, characteristic of many green seaweeds, form an interesting test case to study the potential impacts of genome redundancy on molecular evolution. Contrary to expectations, we show that levels of selection efficacy do not show a strong association with life cycle types in green algae. Taken together, our results underline the prominent impact of body architecture on the molecular evolution of green algal genomes.
我们对生命周期变异和生物体结构如何与分子进化相关联的认识还存在许多空白。我们以绿藻多样的身体结构和生命周期类型为例,假设细胞形态复杂性的增加可能与有效种群数量的减少有关,因为身体较大的生物通常种群数量较少,从而导致漂移增加。在生命周期方面,由于杂合子中的有害等位基因被掩盖,我们预计单倍体占优势的品系相对于二倍体占优势的生命周期会在更强的选择强度下进化。我们利用跨越绿藻系统发育多样性的基因组尺度数据集和系统发育比较方法来测量不同性状类别的相对选择强度。与单细胞藻系相比,我们发现具有更复杂身体结构的藻系具有更强的漂移特征,我们认为这是由于更复杂藻类的有效种群规模较小造成的。与其他藻类的身体结构相比,同义替换和非同义替换的发生率明显更高,这突出表明许多绿藻所特有的虹吸式和虹吸鳞片式身体结构是研究基因组冗余对分子进化潜在影响的一个有趣的试验案例。与预期相反,我们发现选择效力水平与绿藻的生命周期类型并不存在密切联系。总之,我们的研究结果强调了身体结构对绿藻基因组分子进化的显著影响。
{"title":"Genome–wide patterns of selection-drift variation strongly associate with organismal traits across the green plant lineage","authors":"Kavitha Uthanumallian, Andrea Del Cortona, Susana Coelho, Olivier De Clerck, Sebastian Duchene, Heroen Verbruggen","doi":"10.1101/gr.279002.124","DOIUrl":"https://doi.org/10.1101/gr.279002.124","url":null,"abstract":"There are many gaps in our knowledge of how life cycle variation and organismal body architecture associate with molecular evolution. Using the diverse range of green algal body architectures and life cycle types as a test case, we hypothesize that increases in cytomorphological complexity are likely to be associated with a decrease in the effective population size, since larger-bodied organisms typically have smaller populations, resulting in increased drift. For life cycles, we expect haploid-dominant lineages to evolve under stronger selection intensity relative to diploid-dominant life cycles due to masking of deleterious alleles in heterozygotes. We use a genome-scale dataset spanning the phylogenetic diversity of green algae and phylogenetic comparative approaches to measure the relative selection intensity across different trait categories. We show stronger signatures of drift in lineages with more complex body architectures compared to unicellular lineages, which we consider to be a consequence of smaller effective population sizes of the more complex algae. Significantly higher rates of synonymous as well as nonsynonymous substitutions relative to other algal body architectures highlight that siphonous and siphonocladous body architectures, characteristic of many green seaweeds, form an interesting test case to study the potential impacts of genome redundancy on molecular evolution. Contrary to expectations, we show that levels of selection efficacy do not show a strong association with life cycle types in green algae. Taken together, our results underline the prominent impact of body architecture on the molecular evolution of green algal genomes.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"23 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142101033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Memory-bound k-mer selection for large and evolutionary diverse reference libraries 针对大型进化多样性参考文献库的内存约束 k-mer 选择
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-08-29 DOI: 10.1101/gr.279339.124
Ali Osman Berk Sapci, Siavash Mirarab
Using k-mers to find sequence matches is increasingly used in many bioinformatic applications, including metagenomic sequence classification. The accuracy of these downstream applications relies on the density of the reference databases, which are rapidly growing. While the increased density provides hope for dramatic improvements in accuracy, scalability is a concern. The k-mers are kept in the memory during the query time, and saving all k-mers of these ever-expanding databases is fast becoming impractical. Several strategies for subsampling k-mers have been proposed, including minimizers and finding taxon-specific k-mers. However, we contend that these strategies are inadequate, especially when reference sets are taxonomically imbalanced, as are most microbial libraries. In this paper, we explore approaches for selecting a fixed-size subset of k-mers present in an ultra-large dataset to include in a library such that the classification of reads suffers the least. Our experiments demonstrate the limitations of existing approaches, especially for novel and poorly sampled groups. We propose a library construction algorithm called KRANK (K-mer RANKer) that combines several components, including a hierarchical selection strategy with adaptive size restrictions and an equitable coverage strategy. We implement KRANK in highly optimized code and combine it with the locality-sensitive-hashing classifier CONSULT-II to build a taxonomic classification and profiling method. On several benchmarks, KRANK k-mer selection dramatically reduces memory consumption with minimal loss in classification accuracy. We show in extensive analyses based on CAMI benchmarks that KRANK outperforms k-mer-based alternatives in terms of taxonomic profiling and comes close to the best marker-based methods in terms of accuracy.
在许多生物信息学应用中,包括在元基因组序列分类中,越来越多地使用 k-mers 来寻找序列匹配。这些下游应用的准确性取决于参考数据库的密度,而参考数据库的密度正在迅速增加。虽然密度的增加带来了大幅提高准确性的希望,但可扩展性也是一个令人担忧的问题。在查询过程中,k-mers 会被保存在内存中,而保存这些不断扩大的数据库中的所有 k-mers 很快就会变得不切实际。目前已经提出了几种对 k-mers 进行子采样的策略,包括最小化和寻找特定分类群的 k-mers。然而,我们认为这些策略都是不够的,尤其是当参考集在分类学上不平衡时,大多数微生物文库都是如此。在本文中,我们探讨了如何在超大数据集中选择一个固定大小的 k-mer 子集并将其纳入文库,从而使读数分类受到的影响最小。我们的实验证明了现有方法的局限性,特别是对于新颖和采样较差的群体。我们提出了一种名为 KRANK(K-mer RANKer)的文库构建算法,它结合了多个组件,包括具有自适应大小限制的分层选择策略和公平覆盖策略。我们用高度优化的代码实现了 KRANK,并将其与对局部性敏感的哈希分类器 CONSULT-II 结合起来,构建了一种分类和剖析方法。在几个基准测试中,KRANK k-mer 选择大大减少了内存消耗,而分类准确性的损失却微乎其微。我们通过基于 CAMI 基准的大量分析表明,KRANK 在分类剖析方面优于基于 k-mer的替代方法,在准确性方面接近基于标记的最佳方法。
{"title":"Memory-bound k-mer selection for large and evolutionary diverse reference libraries","authors":"Ali Osman Berk Sapci, Siavash Mirarab","doi":"10.1101/gr.279339.124","DOIUrl":"https://doi.org/10.1101/gr.279339.124","url":null,"abstract":"Using <em>k</em>-mers to find sequence matches is increasingly used in many bioinformatic applications, including metagenomic sequence classification. The accuracy of these downstream applications relies on the density of the reference databases, which are rapidly growing. While the increased density provides hope for dramatic improvements in accuracy, scalability is a concern. The <em>k</em>-mers are kept in the memory during the query time, and saving all <em>k</em>-mers of these ever-expanding databases is fast becoming impractical. Several strategies for subsampling <em>k</em>-mers have been proposed, including minimizers and finding taxon-specific <em>k</em>-mers. However, we contend that these strategies are inadequate, especially when reference sets are taxonomically imbalanced, as are most microbial libraries. In this paper, we explore approaches for selecting a fixed-size subset of <em>k</em>-mers present in an ultra-large dataset to include in a library such that the classification of reads suffers the least. Our experiments demonstrate the limitations of existing approaches, especially for novel and poorly sampled groups. We propose a library construction algorithm called KRANK (<em>K</em>-mer RANKer) that combines several components, including a hierarchical selection strategy with adaptive size restrictions and an equitable coverage strategy. We implement KRANK in highly optimized code and combine it with the locality-sensitive-hashing classifier CONSULT-II to build a taxonomic classification and profiling method. On several benchmarks, KRANK <em>k</em>-mer selection dramatically reduces memory consumption with minimal loss in classification accuracy. We show in extensive analyses based on CAMI benchmarks that KRANK outperforms <em>k</em>-mer-based alternatives in terms of taxonomic profiling and comes close to the best marker-based methods in terms of accuracy.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"6 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142101062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A scalable adaptive quadratic kernel method for interpretable epistasis analysis in complex traits 用于复杂性状可解释外显分析的可扩展自适应二次核方法
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-08-29 DOI: 10.1101/gr.279140.124
Boyang Fu, Prateek Anand, Aakarsh Anand, Joel Mefford, Sriram Sankararaman
Our knowledge of the contribution of genetic interactions (epistasis) to variation in human complex traits remains limited, partly due to the lack of efficient, powerful, and interpretable algorithms to detect interactions. Recently proposed approaches for set-based association tests show promise in improving power to detect epistasis by examining the aggregated effects of multiple variants. Nevertheless, these methods either do not scale to large Biobank datasets or lack interpretability. We propose QuadKAST, a scalable algorithm focused on testing pairwise interaction effects (quadratic effects) within small to medium sized sets of genetic variants (<= 100 SNPs) on a trait and provide quantified interpretation of these effects. Comprehensive simulations showed that QuadKAST is well-calibrated. Additionally, QuadKAST is highly sensitive in detecting loci with epistatic signals and accurate in its estimation of quadratic effects. We applied QuadKAST to 52 quantitative phenotypes measured in ~ 300,000 unrelated white British individuals in the UK Biobank to test for quadratic effects within each of 9,515 protein-coding genes. We detected 32 trait-gene pairs across 17 traits and 29 genes that demonstrate statistically significant signals of quadratic effects (p <= 0.05/(9,515*52) accounting for the number of genes and traits tested). Across these trait-gene pairs, the proportion of trait variance explained by quadratic effects is similar to additive effects (median {sigma^{2}_{quad}} / {sigma^{2}_{g}} = 0.15), with five pairs having a ratio greater than one. Our method enables the detailed investigation of epistasis on a large scale, offering new insights into its role and importance.
我们对遗传相互作用(表观遗传)对人类复杂性状变异的贡献的了解仍然有限,部分原因是缺乏高效、强大和可解释的算法来检测相互作用。最近提出的基于集合的关联检验方法显示,通过检查多个变异的聚集效应,有望提高外显子的检测能力。然而,这些方法要么不能扩展到大型生物库数据集,要么缺乏可解释性。我们提出了 QuadKAST,这是一种可扩展的算法,专注于测试中小型遗传变异集(100 个 SNPs)中的成对交互效应(二次效应),并提供对这些效应的量化解释。综合模拟显示,QuadKAST 校准良好。此外,QuadKAST 在检测具有表观信号的位点方面非常灵敏,在估计二次效应方面也很准确。我们将 QuadKAST 应用于英国生物库中约 30 万名无血缘关系的英国白人的 52 种定量表型,以检测 9515 个蛋白质编码基因中每个基因的二次效应。我们在 17 个性状和 29 个基因中发现了 32 对性状-基因对,这些性状-基因对显示出统计学上显著的二次效应信号(p <=0.05/(9,515*52),考虑到测试的基因和性状数量)。在这些性状-基因对中,二次效应解释的性状变异比例与加性效应相似(中位数{sigma^{2}_{quad}} / {sigma^{2}_{g}} = 0.15),其中有 5 对的比率大于 1。我们的方法能够详细研究大规模的表观效应,为了解表观效应的作用和重要性提供了新的视角。
{"title":"A scalable adaptive quadratic kernel method for interpretable epistasis analysis in complex traits","authors":"Boyang Fu, Prateek Anand, Aakarsh Anand, Joel Mefford, Sriram Sankararaman","doi":"10.1101/gr.279140.124","DOIUrl":"https://doi.org/10.1101/gr.279140.124","url":null,"abstract":"Our knowledge of the contribution of genetic interactions (epistasis) to variation in human complex traits remains limited, partly due to the lack of efficient, powerful, and interpretable algorithms to detect interactions. Recently proposed approaches for set-based association tests show promise in improving power to detect epistasis by examining the aggregated effects of multiple variants. Nevertheless, these methods either do not scale to large Biobank datasets or lack interpretability. We propose QuadKAST, a scalable algorithm focused on testing pairwise interaction effects (quadratic effects) within small to medium sized sets of genetic variants (&lt;= 100 SNPs) on a trait and provide quantified interpretation of these effects. Comprehensive simulations showed that QuadKAST is well-calibrated. Additionally, QuadKAST is highly sensitive in detecting loci with epistatic signals and accurate in its estimation of quadratic effects. We applied QuadKAST to 52 quantitative phenotypes measured in ~ 300,000 unrelated white British individuals in the UK Biobank to test for quadratic effects within each of 9,515 protein-coding genes. We detected 32 trait-gene pairs across 17 traits and 29 genes that demonstrate statistically significant signals of quadratic effects (p &lt;= 0.05/(9,515*52) accounting for the number of genes and traits tested). Across these trait-gene pairs, the proportion of trait variance explained by quadratic effects is similar to additive effects (median {sigma^{2}_{quad}} / {sigma^{2}_{g}} = 0.15), with five pairs having a ratio greater than one. Our method enables the detailed investigation of epistasis on a large scale, offering new insights into its role and importance.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"69 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142101032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Genome research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1