首页 > 最新文献

Bioinformatics最新文献

英文 中文
Revisiting Drug-Protein Interaction Prediction: A Novel Global-Local Perspective. 重新审视药物-蛋白质相互作用预测:全新的全局-局部视角
IF 5.8 3区 生物学 Q1 Mathematics Pub Date : 2024-04-22 DOI: 10.1093/bioinformatics/btae271
Zhecheng Zhou, Qingquan Liao, Jinhang Wei, Linlin Zhuo, Xiaonan Wu, Xiangzheng Fu, Quan Zou
MOTIVATIONAccurate inference of potential Drug-protein interactions (DPIs) aids in understanding drug mechanisms and developing novel treatments. Existing deep learning models, however, struggle with accurate node representation in DPI prediction, limiting their performance.RESULTSWe propose a new computational framework that integrates global and local features of nodes in the drug-protein bipartite graph for efficient DPI inference. Initially, we employ pre-trained models to acquire fundamental knowledge of drugs and proteins and to determine their initial features. Subsequently, the MinHash and HyperLogLog algorithms are utilized to estimate the similarity and set cardinality between drug and protein subgraphs, serving as their local features. Then, an energy-constrained diffusion mechanism is integrated into the transformer architecture, capturing interdependencies between nodes in the drug-protein bipartite graph and extracting their global features. Finally, we fuse the local and global features of nodes and employ multi-layer perceptrons (MLPs) to predict the likelihood of potential DPIs. A comprehensive and precise node representation guarantees efficient prediction of unknown DPIs by the model. Various experiments validate the accuracy and reliability of our model, with molecular docking results revealing its capability to identify potential DPIs not present in existing databases. This approach are expected to offer valuable insights for furthering drug repurposing and personalized medicine research.AVAILABILITY AND IMPLEMENTATIONOur code and data are accessible at: https://github.com/ZZCrazy00/DPI.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.
动机准确推断潜在的药物蛋白相互作用(DPI)有助于了解药物机制和开发新型疗法。我们提出了一种新的计算框架,该框架整合了药物-蛋白质双向图中节点的全局和局部特征,以实现高效的 DPI 推断。首先,我们使用预先训练好的模型来获取药物和蛋白质的基本知识,并确定它们的初始特征。随后,我们利用 MinHash 和 HyperLogLog 算法来估计药物和蛋白质子图之间的相似性和集合万有引力,作为它们的局部特征。然后,将能量受限扩散机制集成到转换器架构中,捕捉药物-蛋白质双元图中节点之间的相互依赖关系,并提取其全局特征。最后,我们融合节点的局部和全局特征,采用多层感知器(MLP)来预测潜在 DPI 的可能性。全面而精确的节点表示保证了模型对未知 DPI 的高效预测。各种实验验证了我们模型的准确性和可靠性,分子对接结果表明该模型有能力识别现有数据库中不存在的潜在 DPI。这种方法有望为促进药物再利用和个性化医学研究提供有价值的见解。可用性和实施我们的代码和数据可在以下网址访问: https://github.com/ZZCrazy00/DPI.SUPPLEMENTARY 信息补充数据可在 Bioinformatics online 上获取。
{"title":"Revisiting Drug-Protein Interaction Prediction: A Novel Global-Local Perspective.","authors":"Zhecheng Zhou, Qingquan Liao, Jinhang Wei, Linlin Zhuo, Xiaonan Wu, Xiangzheng Fu, Quan Zou","doi":"10.1093/bioinformatics/btae271","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae271","url":null,"abstract":"MOTIVATION\u0000Accurate inference of potential Drug-protein interactions (DPIs) aids in understanding drug mechanisms and developing novel treatments. Existing deep learning models, however, struggle with accurate node representation in DPI prediction, limiting their performance.\u0000\u0000\u0000RESULTS\u0000We propose a new computational framework that integrates global and local features of nodes in the drug-protein bipartite graph for efficient DPI inference. Initially, we employ pre-trained models to acquire fundamental knowledge of drugs and proteins and to determine their initial features. Subsequently, the MinHash and HyperLogLog algorithms are utilized to estimate the similarity and set cardinality between drug and protein subgraphs, serving as their local features. Then, an energy-constrained diffusion mechanism is integrated into the transformer architecture, capturing interdependencies between nodes in the drug-protein bipartite graph and extracting their global features. Finally, we fuse the local and global features of nodes and employ multi-layer perceptrons (MLPs) to predict the likelihood of potential DPIs. A comprehensive and precise node representation guarantees efficient prediction of unknown DPIs by the model. Various experiments validate the accuracy and reliability of our model, with molecular docking results revealing its capability to identify potential DPIs not present in existing databases. This approach are expected to offer valuable insights for furthering drug repurposing and personalized medicine research.\u0000\u0000\u0000AVAILABILITY AND IMPLEMENTATION\u0000Our code and data are accessible at: https://github.com/ZZCrazy00/DPI.\u0000\u0000\u0000SUPPLEMENTARY INFORMATION\u0000Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140674931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MEG-PPIS: a fast protein-protein interaction site prediction method based on multi-scale graph information and equivariant graph neural network. MEG-PPIS:基于多尺度图信息和等变图神经网络的蛋白质-蛋白质相互作用位点快速预测方法。
IF 5.8 3区 生物学 Q1 Mathematics Pub Date : 2024-04-18 DOI: 10.1093/bioinformatics/btae269
Hongzhen Ding, Xue Li, Peifu Han, Xu Tian, Fengrui Jing, Shuang Wang, Tao Song, Hanjiao Fu, Na Kang
MOTIVATIONProtein-protein interaction sites (PPIS) are crucial for deciphering protein action mechanisms and related medical research, which is the key issue in protein action research. Recent studies have shown that graph neural networks have achieved outstanding performance in predicting PPIS. However, these studies often neglect the modeling of information at different scales in the graph and the symmetry of protein molecules within three-dimensional space.RESULTSIn response to this gap, this paper proposes the MEG-PPIS approach, a PPIS prediction method based on multi-scale graph information and E(n) equivariant graph neural network (EGNN). There are two channels in MEG-PPIS: the original graph and the subgraph obtained by graph pooling. The model can iteratively update the features of the original graph and subgraph through the weight-sharing EGNN. Subsequently, the max-pooling operation aggregates the updated features of the original graph and subgraph. Ultimately, the model feeds node features into the prediction layer to obtain prediction results. Comparative assessments against other methods on benchmark datasets reveal that MEG-PPIS achieves optimal performance across all evaluation metrics and gets the fastest runtime. Furthermore, specific case studies demonstrate that our method can predict more true positive and true negative sites than the current best method, proving that our model achieves better performance in the PPIS prediction task.AVAILABILITY AND IMPLEMENTATIONThe data and code are available at https://github.com/dhz234/MEG-PPIS.git.
动机蛋白质-蛋白质相互作用位点(PPIS)对于破译蛋白质作用机制和相关医学研究至关重要,是蛋白质作用研究的关键问题。最近的研究表明,图神经网络在预测 PPIS 方面表现出色。针对这一不足,本文提出了基于多尺度图信息和 E(n) 等变图神经网络(EGNN)的 PPIS 预测方法--MEG-PPIS 方法。MEG-PPIS 有两个通道:原始图和图池化得到的子图。该模型可通过权重共享 EGNN 迭代更新原始图和子图的特征。随后,最大池化操作汇总原始图和子图的更新特征。最后,该模型将节点特征输入预测层,以获得预测结果。在基准数据集上与其他方法进行的比较评估显示,MEG-PPIS 在所有评估指标上都达到了最佳性能,并获得了最快的运行时间。此外,具体案例研究表明,与目前最好的方法相比,我们的方法能预测出更多的真阳性和真阴性站点,这证明我们的模型在 PPIS 预测任务中取得了更好的性能。可用性和实施数据和代码可在 https://github.com/dhz234/MEG-PPIS.git 上获取。
{"title":"MEG-PPIS: a fast protein-protein interaction site prediction method based on multi-scale graph information and equivariant graph neural network.","authors":"Hongzhen Ding, Xue Li, Peifu Han, Xu Tian, Fengrui Jing, Shuang Wang, Tao Song, Hanjiao Fu, Na Kang","doi":"10.1093/bioinformatics/btae269","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae269","url":null,"abstract":"MOTIVATION\u0000Protein-protein interaction sites (PPIS) are crucial for deciphering protein action mechanisms and related medical research, which is the key issue in protein action research. Recent studies have shown that graph neural networks have achieved outstanding performance in predicting PPIS. However, these studies often neglect the modeling of information at different scales in the graph and the symmetry of protein molecules within three-dimensional space.\u0000\u0000\u0000RESULTS\u0000In response to this gap, this paper proposes the MEG-PPIS approach, a PPIS prediction method based on multi-scale graph information and E(n) equivariant graph neural network (EGNN). There are two channels in MEG-PPIS: the original graph and the subgraph obtained by graph pooling. The model can iteratively update the features of the original graph and subgraph through the weight-sharing EGNN. Subsequently, the max-pooling operation aggregates the updated features of the original graph and subgraph. Ultimately, the model feeds node features into the prediction layer to obtain prediction results. Comparative assessments against other methods on benchmark datasets reveal that MEG-PPIS achieves optimal performance across all evaluation metrics and gets the fastest runtime. Furthermore, specific case studies demonstrate that our method can predict more true positive and true negative sites than the current best method, proving that our model achieves better performance in the PPIS prediction task.\u0000\u0000\u0000AVAILABILITY AND IMPLEMENTATION\u0000The data and code are available at https://github.com/dhz234/MEG-PPIS.git.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140689482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ITree: a user-driven tool for interactive decision-making with classification trees. ITree:用户驱动的分类树互动决策工具。
IF 5.8 3区 生物学 Q1 Mathematics Pub Date : 2024-04-18 DOI: 10.1093/bioinformatics/btae273
Hubert Sokołowski, M. Czajkowski, Anna Czajkowska, K. Jurczuk, M. Kretowski
MOTIVATIONITree is an intuitive web tool for the manual, semi-automatic, and automatic induction of decision trees. It enables interactive modifications of tree structures and incorporates Relative Expression Analysis for detecting complex patterns in high-throughput molecular data. This makes ITree a versatile tool for both research and education in biomedical data analysis.RESULTSThe tool allows users to instantly see the effects of modifications on decision trees, with updates to predictions and statistics displayed in real time, facilitating a deeper understanding of data classification processes.AVAILABILITY AND IMPLEMENTATIONAvailable online at https://itree.wi.pb.edu.pl. Source code and documentation are hosted on GitHub at https://github.com/hsokolowski/iTree.SUPPLEMENTARY INFORMATIONAdditional resources are provided to enhance user experience and support.
MOTIVATIONITree 是一款直观的网络工具,用于手动、半自动和自动归纳决策树。它能对树结构进行交互式修改,并结合了相对表达分析法,用于检测高通量分子数据中的复杂模式。这使得 ITree 成为生物医学数据分析研究和教育的多功能工具。结果该工具允许用户即时查看对决策树的修改效果,并实时显示预测和统计的更新,有助于加深对数据分类过程的理解。可用性和实施可在 https://itree.wi.pb.edu.pl 上在线获取。源代码和文档托管在 GitHub 上,网址为 https://github.com/hsokolowski/iTree.SUPPLEMENTARY 信息为增强用户体验和支持,我们还提供了其他资源。
{"title":"ITree: a user-driven tool for interactive decision-making with classification trees.","authors":"Hubert Sokołowski, M. Czajkowski, Anna Czajkowska, K. Jurczuk, M. Kretowski","doi":"10.1093/bioinformatics/btae273","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae273","url":null,"abstract":"MOTIVATION\u0000ITree is an intuitive web tool for the manual, semi-automatic, and automatic induction of decision trees. It enables interactive modifications of tree structures and incorporates Relative Expression Analysis for detecting complex patterns in high-throughput molecular data. This makes ITree a versatile tool for both research and education in biomedical data analysis.\u0000\u0000\u0000RESULTS\u0000The tool allows users to instantly see the effects of modifications on decision trees, with updates to predictions and statistics displayed in real time, facilitating a deeper understanding of data classification processes.\u0000\u0000\u0000AVAILABILITY AND IMPLEMENTATION\u0000Available online at https://itree.wi.pb.edu.pl. Source code and documentation are hosted on GitHub at https://github.com/hsokolowski/iTree.\u0000\u0000\u0000SUPPLEMENTARY INFORMATION\u0000Additional resources are provided to enhance user experience and support.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140686788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Peptide Set Test: a Peptide-Centric Strategy to Infer Differentially Expressed Proteins. 肽集测试:以肽为中心推断差异表达蛋白质的策略
IF 5.8 3区 生物学 Q1 Mathematics Pub Date : 2024-04-17 DOI: 10.1093/bioinformatics/btae270
Junmin Wang, Steven Novick
MOTIVATIONThe clinical translation of mass spectrometry-based proteomics has been challenging due to limited statistical power caused by large technical variability and inter-patient heterogeneity. Bottom-up proteomics provides an indirect measurement of proteins through digested peptides. This raises the question whether peptide measurements can be used directly to better distinguish differentially expressed proteins.RESULTSWe present a novel method called the peptide set test, which detects coordinated changes in the expression of peptides originating from the same protein and compares them to the rest of the peptidome. Applying our method to data from a published spike-in experiment and simulations demonstrates improved sensitivity without compromising precision, compared to aggregation-based approaches. Additionally, applying the peptide set test to compare the tumor proteomes of tamoxifen-sensitive and tamoxifen-resistant breast cancer patients reveals significant alterations in peptide levels of collagen XII, suggesting an association between collagen XII-mediated matrix reassembly and tamoxifen resistance. Our study establishes the peptide set test as a powerful peptide-centric strategy to infer differential expression in proteomics studies.AVAILABILITYPeptide Set Test (PepSetTest) is publicly available at https://github.com/JmWangBio/PepSetTest.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.
动机:由于技术上的巨大差异和患者间的异质性导致统计能力有限,基于质谱的蛋白质组学的临床转化一直面临挑战。自下而上的蛋白质组学通过消化肽对蛋白质进行间接测量。结果我们提出了一种名为肽集测试的新方法,它能检测源自同一蛋白质的肽表达的协调变化,并将其与肽组的其他部分进行比较。与基于聚集的方法相比,将我们的方法应用于已发表的尖峰实验数据和模拟实验,结果表明在不影响精度的前提下提高了灵敏度。此外,应用肽集检验比较对他莫昔芬敏感和对他莫昔芬耐药的乳腺癌患者的肿瘤蛋白质组发现,胶原蛋白 XII 的肽水平发生了显著变化,这表明胶原蛋白 XII 介导的基质重组与他莫昔芬耐药之间存在关联。我们的研究证明肽集测试是一种强大的以肽为中心的策略,可用于推断蛋白质组学研究中的差异表达。AVAILABILITY肽集测试(PepSetTest)可在 https://github.com/JmWangBio/PepSetTest.SUPPLEMENTARY 上公开获取信息补充数据可在 Bioinformatics online 上获取。
{"title":"Peptide Set Test: a Peptide-Centric Strategy to Infer Differentially Expressed Proteins.","authors":"Junmin Wang, Steven Novick","doi":"10.1093/bioinformatics/btae270","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae270","url":null,"abstract":"MOTIVATION\u0000The clinical translation of mass spectrometry-based proteomics has been challenging due to limited statistical power caused by large technical variability and inter-patient heterogeneity. Bottom-up proteomics provides an indirect measurement of proteins through digested peptides. This raises the question whether peptide measurements can be used directly to better distinguish differentially expressed proteins.\u0000\u0000\u0000RESULTS\u0000We present a novel method called the peptide set test, which detects coordinated changes in the expression of peptides originating from the same protein and compares them to the rest of the peptidome. Applying our method to data from a published spike-in experiment and simulations demonstrates improved sensitivity without compromising precision, compared to aggregation-based approaches. Additionally, applying the peptide set test to compare the tumor proteomes of tamoxifen-sensitive and tamoxifen-resistant breast cancer patients reveals significant alterations in peptide levels of collagen XII, suggesting an association between collagen XII-mediated matrix reassembly and tamoxifen resistance. Our study establishes the peptide set test as a powerful peptide-centric strategy to infer differential expression in proteomics studies.\u0000\u0000\u0000AVAILABILITY\u0000Peptide Set Test (PepSetTest) is publicly available at https://github.com/JmWangBio/PepSetTest.\u0000\u0000\u0000SUPPLEMENTARY INFORMATION\u0000Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140691239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GAUSS: a summary-statistics-based R package for accurate estimation of linkage disequilibrium for variants, gaussian imputation and TWAS analysis of cosmopolitan cohorts. GAUSS:一个基于摘要统计的 R 软件包,用于准确估计变异的连锁不平衡、高斯估算和世界性队列的 TWAS 分析。
IF 5.8 3区 生物学 Q1 Mathematics Pub Date : 2024-04-17 DOI: 10.1093/bioinformatics/btae203
Donghyung Lee, S. Bacanu
MOTIVATIONAs the availability of larger and more ethnically diverse reference panels grows, there is an increase in demand for ancestry-informed imputation of genome-wide association studies (GWAS), and other downstream analyses, e.g., fine-mapping. Performing such analyses at the genotype level is computationally challenging and necessitates, at best, a laborious process to access individual-level genotype and phenotype data. Summary-statistics-based tools, not requiring individual-level data, provide an efficient alternative that streamlines computational requirements and promotes open science by simplifying the re-analysis and downstream analysis of existing GWAS summary data. However, existing tools perform only disparate parts of needed analysis, have only command-line interfaces, and are difficult to extend/link by applied researchers.RESULTSTo address these challenges, we present GAUSS-a comprehensive and user-friendly R package designed to facilitate the re-analysis/downstream analysis of GWAS summary statistics. GAUSS offers an integrated toolkit for a range of functionalities, including i) estimating ancestry proportion of study cohorts, ii) calculating ancestry-informed linkage disequilibrium, iii) imputing summary statistics of unobserved variants, iv) conducting transcriptome-wide association studies, and v) correcting for "Winner's Curse" biases. Notably, GAUSS utilizes an expansive, multi-ethnic reference panel consisting of 32,953 genomes from 29 ethnic groups. This panel enhances the range and accuracy of imputable variants, including the ability to impute summary statistics of rarer variants. As a result, GAUSS elevates the quality and applicability of existing GWAS analyses without requiring access to subject-level genotypic and phenotypic information.AVAILABILITY AND IMPLEMENTATIONThe GAUSS R package, complete with its source code, is readily accessible to the public via our GitHub repository at https://github.com/statsleelab/gauss. To further assist users, we provided illustrative use-case scenarios that are conveniently found at https://statsleelab.github.io/gauss/, along with a comprehensive user guide detailed in Supplementary Text 1 from Supplementary Data.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.
动机 随着规模越来越大、种族越来越多样化的参照组的增多,对全基因组关联研究(GWAS)的祖先推算及其他下游分析(如精细图谱)的需求也在增加。在基因型水平上进行此类分析在计算上极具挑战性,充其量只能通过费力的过程来获取个体水平的基因型和表型数据。基于摘要统计的工具不需要个体水平的数据,它提供了一种高效的替代方法,通过简化现有 GWAS 摘要数据的再分析和下游分析,简化了计算要求,促进了开放科学的发展。为了应对这些挑战,我们提出了 GAUSS--一个全面且用户友好的 R 软件包,旨在促进 GWAS 摘要统计数据的再分析/下游分析。GAUSS 为一系列功能提供了集成工具包,包括 i) 估算研究队列的祖先比例;ii) 计算祖先信息关联不平衡;iii) 归因未观察变异的汇总统计;iv) 开展全转录组关联研究;v) 校正 "赢家诅咒 "偏倚。值得注意的是,GAUSS 利用了一个由来自 29 个种族群体的 32953 个基因组组成的庞大的多种族参考面板。该数据库提高了可归因变异的范围和准确性,包括归因较罕见变异的汇总统计数据的能力。因此,GAUSS 提高了现有 GWAS 分析的质量和适用性,而无需访问受试者水平的基因型和表型信息。可用性和实施 GAUSS R 软件包及其源代码可通过我们的 GitHub 存储库 https://github.com/statsleelab/gauss 随时供公众访问。为了进一步帮助用户,我们在 https://statsleelab.github.io/gauss/ 上提供了示例用例,并在补充数据的补充文本 1 中详细介绍了全面的用户指南。补充信息补充数据可在 Bioinformatics online 上获取。
{"title":"GAUSS: a summary-statistics-based R package for accurate estimation of linkage disequilibrium for variants, gaussian imputation and TWAS analysis of cosmopolitan cohorts.","authors":"Donghyung Lee, S. Bacanu","doi":"10.1093/bioinformatics/btae203","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae203","url":null,"abstract":"MOTIVATION\u0000As the availability of larger and more ethnically diverse reference panels grows, there is an increase in demand for ancestry-informed imputation of genome-wide association studies (GWAS), and other downstream analyses, e.g., fine-mapping. Performing such analyses at the genotype level is computationally challenging and necessitates, at best, a laborious process to access individual-level genotype and phenotype data. Summary-statistics-based tools, not requiring individual-level data, provide an efficient alternative that streamlines computational requirements and promotes open science by simplifying the re-analysis and downstream analysis of existing GWAS summary data. However, existing tools perform only disparate parts of needed analysis, have only command-line interfaces, and are difficult to extend/link by applied researchers.\u0000\u0000\u0000RESULTS\u0000To address these challenges, we present GAUSS-a comprehensive and user-friendly R package designed to facilitate the re-analysis/downstream analysis of GWAS summary statistics. GAUSS offers an integrated toolkit for a range of functionalities, including i) estimating ancestry proportion of study cohorts, ii) calculating ancestry-informed linkage disequilibrium, iii) imputing summary statistics of unobserved variants, iv) conducting transcriptome-wide association studies, and v) correcting for \"Winner's Curse\" biases. Notably, GAUSS utilizes an expansive, multi-ethnic reference panel consisting of 32,953 genomes from 29 ethnic groups. This panel enhances the range and accuracy of imputable variants, including the ability to impute summary statistics of rarer variants. As a result, GAUSS elevates the quality and applicability of existing GWAS analyses without requiring access to subject-level genotypic and phenotypic information.\u0000\u0000\u0000AVAILABILITY AND IMPLEMENTATION\u0000The GAUSS R package, complete with its source code, is readily accessible to the public via our GitHub repository at https://github.com/statsleelab/gauss. To further assist users, we provided illustrative use-case scenarios that are conveniently found at https://statsleelab.github.io/gauss/, along with a comprehensive user guide detailed in Supplementary Text 1 from Supplementary Data.\u0000\u0000\u0000SUPPLEMENTARY INFORMATION\u0000Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140691258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient cytometry analysis with FlowSOM in python boosts interoperability with other single-cell tools. 在 python 中使用 FlowSOM 进行高效细胞测量分析,提高了与其他单细胞工具的互操作性。
IF 5.8 3区 生物学 Q1 Mathematics Pub Date : 2024-04-17 DOI: 10.1093/bioinformatics/btae179
Artuur Couckuyt, Benjamin Rombaut, Yvan Saeys, S. van Gassen
MOTIVATIONWe describe a new Python implementation of FlowSOM, a clustering method for cytometry data.RESULTSThis implementation is faster than the original version in R, better adapted to work with single-cell omics data including integration with current single-cell data structures and includes all the original visualizations, such as the star and pie plot.AVAILABILITYThe FlowSOM Python implementation is freely available on GitHub: https://github.com/saeyslab/FlowSOM_Python.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.
MOTIVATION We describe a new Python implementation of FlowSOM, a clustering method for cytometry data.ResultThis implementation is faster than the original version in R, better adapted to work with single-cell omics data including integration with current single-cell data structures and includes all the original visualizations, such as the star and pie plot.AVAILABILITYThe FlowSOM Python implementation is free available on GitHub: https://github.com/saeyslab/FlowSOM_Python.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.
{"title":"Efficient cytometry analysis with FlowSOM in python boosts interoperability with other single-cell tools.","authors":"Artuur Couckuyt, Benjamin Rombaut, Yvan Saeys, S. van Gassen","doi":"10.1093/bioinformatics/btae179","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae179","url":null,"abstract":"MOTIVATION\u0000We describe a new Python implementation of FlowSOM, a clustering method for cytometry data.\u0000\u0000\u0000RESULTS\u0000This implementation is faster than the original version in R, better adapted to work with single-cell omics data including integration with current single-cell data structures and includes all the original visualizations, such as the star and pie plot.\u0000\u0000\u0000AVAILABILITY\u0000The FlowSOM Python implementation is freely available on GitHub: https://github.com/saeyslab/FlowSOM_Python.\u0000\u0000\u0000SUPPLEMENTARY INFORMATION\u0000Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140693961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TransGEM: a molecule generation model based on transformer with gene expression data. TransGEM:基于转换器和基因表达数据的分子生成模型。
IF 5.8 3区 生物学 Q1 Mathematics Pub Date : 2024-04-17 DOI: 10.1093/bioinformatics/btae189
Yanguang Liu, Hailong Yu, Xinya Duan, Xiaomin Zhang, Ting Cheng, Feng Jiang, Hao Tang, Yao Ruan, Miao Zhang, Hongyu Zhang, Qingye Zhang
MOTIVATIONIt is difficult to generate new molecules with desirable bioactivity through ligand-based de novo drug design, and receptor-based de novo drug design is constrained by disease target information availability. The combination of artificial intelligence and phenotype-based de novo drug design can generate new bioactive molecules, independent from disease target information. Gene expression profiles can be used to characterize biological phenotypes. The Transformer model can be utilized to capture the associations between gene expression profiles and molecular structures due to its remarkable ability in processing contextual information.RESULTSWe propose TransGEM (Transformer-based model from gene expression to molecules), which is a phenotype-based de novo drug design model. A specialized gene expression encoder is employed to embed gene expression difference values between diseased cell lines and their corresponding normal tissue cells into TransGEM model. The results demonstrate that the TransGEM model can generate molecules with desirable evaluation metrics and property distributions. Case studies illustrate that TransGEM model can generate structurally novel molecules with good binding affinity to disease target proteins. The majority of genes with high attention scores obtained from TransGEM model are associated with the onset of the disease, indicating the potential of these genes as disease targets. Therefore, this study provides a new paradigm for de novo drug design, and it will promote phenotype-based drug discovery.AVAILABILITYThe code is available at https://github.com/hzauzqy/TransGEM.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.
动机 通过基于配体的从头药物设计很难产生具有理想生物活性的新分子,而基于受体的从头药物设计又受到疾病靶点信息的限制。人工智能与基于表型的从头药物设计相结合,可以产生新的生物活性分子,而不受疾病靶点信息的影响。基因表达谱可用来描述生物表型。由于 Transformer 模型在处理上下文信息方面的卓越能力,它可以用来捕捉基因表达谱和分子结构之间的关联。结果我们提出了 TransGEM(基于 Transformer 的基因表达到分子模型),这是一种基于表型的新药设计模型。我们采用专门的基因表达编码器将病变细胞系与相应正常组织细胞的基因表达差异值嵌入 TransGEM 模型。结果表明,TransGEM 模型可以生成具有理想评价指标和属性分布的分子。案例研究表明,TransGEM 模型可以生成结构新颖、与疾病靶蛋白结合亲和力良好的分子。从 TransGEM 模型中获得的高关注度基因大多与疾病的发病有关,这表明这些基因有可能成为疾病靶点。因此,这项研究为从头开始的药物设计提供了一个新的范例,它将促进基于表型的药物发现。AVAILABILITY代码可在https://github.com/hzauzqy/TransGEM.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.
{"title":"TransGEM: a molecule generation model based on transformer with gene expression data.","authors":"Yanguang Liu, Hailong Yu, Xinya Duan, Xiaomin Zhang, Ting Cheng, Feng Jiang, Hao Tang, Yao Ruan, Miao Zhang, Hongyu Zhang, Qingye Zhang","doi":"10.1093/bioinformatics/btae189","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae189","url":null,"abstract":"MOTIVATION\u0000It is difficult to generate new molecules with desirable bioactivity through ligand-based de novo drug design, and receptor-based de novo drug design is constrained by disease target information availability. The combination of artificial intelligence and phenotype-based de novo drug design can generate new bioactive molecules, independent from disease target information. Gene expression profiles can be used to characterize biological phenotypes. The Transformer model can be utilized to capture the associations between gene expression profiles and molecular structures due to its remarkable ability in processing contextual information.\u0000\u0000\u0000RESULTS\u0000We propose TransGEM (Transformer-based model from gene expression to molecules), which is a phenotype-based de novo drug design model. A specialized gene expression encoder is employed to embed gene expression difference values between diseased cell lines and their corresponding normal tissue cells into TransGEM model. The results demonstrate that the TransGEM model can generate molecules with desirable evaluation metrics and property distributions. Case studies illustrate that TransGEM model can generate structurally novel molecules with good binding affinity to disease target proteins. The majority of genes with high attention scores obtained from TransGEM model are associated with the onset of the disease, indicating the potential of these genes as disease targets. Therefore, this study provides a new paradigm for de novo drug design, and it will promote phenotype-based drug discovery.\u0000\u0000\u0000AVAILABILITY\u0000The code is available at https://github.com/hzauzqy/TransGEM.\u0000\u0000\u0000SUPPLEMENTARY INFORMATION\u0000Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140690793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
wgd v2: a suite of tools to uncover and date ancient polyploidy and whole-genome duplication. wgd v2:一套揭示古代多倍体和全基因组复制并确定其日期的工具。
IF 5.8 3区 生物学 Q1 Mathematics Pub Date : 2024-04-17 DOI: 10.1093/bioinformatics/btae272
Hen-Huang Chen, A. Zwaenepoel, Yves Van de Peer
MOTIVATIONMajor improvements in sequencing technologies and genome sequence assembly have led to a huge increase in the number of available genome sequences. In turn, these genome sequences form an invaluable source for evolutionary, ecological, and comparative studies. One kind of analysis that has become routine is the search for traces of ancient polyploidy, particularly for plant genomes, where whole-genome duplication (WGD) is rampant.RESULTSHere, we present a major update of a previously developed tool wgd, namely wgd v2, to look for remnants of ancient polyploidy, or WGD. We implemented novel and improved previously developed tools to a) construct KS age distributions for the whole-paranome (collection of all duplicated genes in a genome), b) unravel intra- and inter- genomic collinearity resulting from WGDs, c) fit mixture models to age distributions of gene duplicates, d) correct substitution rate variation for phylogenetic placement of WGDs, and e) date ancient WGDs via phylogenetic dating of WGD-retained gene duplicates. The applicability and feasibility of wgd v2 for the identification and the relative and absolute dating of ancient WGDs is demonstrated using different plant genomes.AVAILABILITYwgd v2 is open source and available at https://github.com/heche-psb/wgd.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.
动机测序技术和基因组序列组装技术的重大改进,使现有基因组序列的数量大幅增加。反过来,这些基因组序列又为进化、生态和比较研究提供了宝贵的资料。其中一种已成为常规的分析方法是寻找古代多倍体的痕迹,尤其是在全基因组重复(WGD)非常普遍的植物基因组中。结果在此,我们对之前开发的工具 wgd(即 wgd v2)进行了重大更新,以寻找古代多倍体或 WGD 的残余。我们采用了新颖的工具并对之前开发的工具进行了改进,以便:a)构建全基因组(基因组中所有重复基因的集合)的 KS 年龄分布;b)揭示 WGD 所导致的基因组内和基因组间的共线性;c)拟合基因重复体年龄分布的混合模型;d)纠正 WGD 系统发育位置的替代率变化;e)通过对保留 WGD 的基因重复体进行系统发育测年,确定古代 WGD 的日期。利用不同的植物基因组证明了 wgd v2 在鉴定古代 WGDs 以及确定其相对和绝对年代方面的适用性和可行性。AVAILABILITYwgd v2 是开放源码,可在 https://github.com/heche-psb/wgd.SUPPLEMENTARY 上获取信息补充数据可在 Bioinformatics online 上获取。
{"title":"wgd v2: a suite of tools to uncover and date ancient polyploidy and whole-genome duplication.","authors":"Hen-Huang Chen, A. Zwaenepoel, Yves Van de Peer","doi":"10.1093/bioinformatics/btae272","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae272","url":null,"abstract":"MOTIVATION\u0000Major improvements in sequencing technologies and genome sequence assembly have led to a huge increase in the number of available genome sequences. In turn, these genome sequences form an invaluable source for evolutionary, ecological, and comparative studies. One kind of analysis that has become routine is the search for traces of ancient polyploidy, particularly for plant genomes, where whole-genome duplication (WGD) is rampant.\u0000\u0000\u0000RESULTS\u0000Here, we present a major update of a previously developed tool wgd, namely wgd v2, to look for remnants of ancient polyploidy, or WGD. We implemented novel and improved previously developed tools to a) construct KS age distributions for the whole-paranome (collection of all duplicated genes in a genome), b) unravel intra- and inter- genomic collinearity resulting from WGDs, c) fit mixture models to age distributions of gene duplicates, d) correct substitution rate variation for phylogenetic placement of WGDs, and e) date ancient WGDs via phylogenetic dating of WGD-retained gene duplicates. The applicability and feasibility of wgd v2 for the identification and the relative and absolute dating of ancient WGDs is demonstrated using different plant genomes.\u0000\u0000\u0000AVAILABILITY\u0000wgd v2 is open source and available at https://github.com/heche-psb/wgd.\u0000\u0000\u0000SUPPLEMENTARY INFORMATION\u0000Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140690690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AbLEF: Antibody Language Ensemble Fusion for thermodynamically empowered property predictions. AbLEF:用于热力学特性预测的抗体语言集合融合。
IF 5.8 3区 生物学 Q1 Mathematics Pub Date : 2024-04-16 DOI: 10.1093/bioinformatics/btae268
Zachary A Rollins, Talal Widatalla, Andrew Waight, Alan C Cheng, Essam Metwally
MOTIVATIONPre-trained protein language and/or structural models are often fine-tuned on drug development properties (ie, developability properties) to accelerate drug discovery initiatives. However, these models generally rely on a single structural conformation and/or a single sequence as a molecular representation. We present a physics-based model whereby 3D conformational ensemble representations are fused by a transformer-based architecture and concatenated to a language representation to predict antibody protein properties. AbLEF enables the direct infusion of thermodynamic information into latent space and this enhances property prediction by explicitly infusing dynamic molecular behavior that occurs during experimental measurement.RESULTSWe showcase the AbLEF model on two developability properties: hydrophobic interaction chromatography retention time (HIC-RT) and temperature of aggregation (Tagg). We find that (1) 3D conformational ensembles that are generated from molecular simulation can further improve antibody property prediction for small datasets, (2) the performance benefit from 3D conformational ensembles matches shallow machine learning methods in the small data regime, and (3) fine-tuned large protein language models can match smaller antibody-specific language models at predicting antibody properties.AVAILABILITY AND IMPLEMENTATIONAbLEF codebase is available at https://github.com/merck/AbLEF.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.
动机 预先训练好的蛋白质语言和/或结构模型通常会根据药物开发特性(即可开发性特性)进行微调,以加快药物发现的进程。然而,这些模型通常依赖于单一结构构象和/或单一序列作为分子表征。我们提出了一种基于物理的模型,通过这种模型,三维构象组合表征被一种基于变换器的架构融合,并与语言表征相串联,从而预测抗体蛋白质的特性。AbLEF 能够将热力学信息直接注入潜空间,并通过明确注入实验测量过程中发生的动态分子行为来增强特性预测。结果 我们展示了 AbLEF 模型的两种显影特性:疏水相互作用色谱保留时间(HIC-RT)和聚集温度(Tagg)。我们发现:(1) 通过分子模拟生成的三维构象组合可以进一步改善小数据集的抗体性质预测;(2) 三维构象组合的性能优势与小数据体系中的浅层机器学习方法相匹配;(3) 经过微调的大型蛋白质语言模型在预测抗体性质方面可以与较小的特定抗体语言模型相媲美。
{"title":"AbLEF: Antibody Language Ensemble Fusion for thermodynamically empowered property predictions.","authors":"Zachary A Rollins, Talal Widatalla, Andrew Waight, Alan C Cheng, Essam Metwally","doi":"10.1093/bioinformatics/btae268","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae268","url":null,"abstract":"MOTIVATION\u0000Pre-trained protein language and/or structural models are often fine-tuned on drug development properties (ie, developability properties) to accelerate drug discovery initiatives. However, these models generally rely on a single structural conformation and/or a single sequence as a molecular representation. We present a physics-based model whereby 3D conformational ensemble representations are fused by a transformer-based architecture and concatenated to a language representation to predict antibody protein properties. AbLEF enables the direct infusion of thermodynamic information into latent space and this enhances property prediction by explicitly infusing dynamic molecular behavior that occurs during experimental measurement.\u0000\u0000\u0000RESULTS\u0000We showcase the AbLEF model on two developability properties: hydrophobic interaction chromatography retention time (HIC-RT) and temperature of aggregation (Tagg). We find that (1) 3D conformational ensembles that are generated from molecular simulation can further improve antibody property prediction for small datasets, (2) the performance benefit from 3D conformational ensembles matches shallow machine learning methods in the small data regime, and (3) fine-tuned large protein language models can match smaller antibody-specific language models at predicting antibody properties.\u0000\u0000\u0000AVAILABILITY AND IMPLEMENTATION\u0000AbLEF codebase is available at https://github.com/merck/AbLEF.\u0000\u0000\u0000SUPPLEMENTARY INFORMATION\u0000Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140697116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scPRAM accurately predicts single-cell gene expression perturbation response based on attention mechanism. 基于注意力机制,scPRAM 可准确预测单细胞基因表达扰动反应。
IF 5.8 3区 生物学 Q1 Mathematics Pub Date : 2024-04-15 DOI: 10.1093/bioinformatics/btae265
Qun Jiang, Shengquan Chen, Xiaoyang Chen, Rui Jiang
MOTIVATIONWith the rapid advancement of single-cell sequencing technology, it becomes gradually possible to delve into the cellular responses to various external perturbations at the gene expression level. However, obtaining perturbed samples in certain scenarios may be considerably challenging, and the substantial costs associated with sequencing also curtail the feasibility of large-scale experimentation. A repertoire of methodologies has been employed for forecasting perturbative responses in single-cell gene expression. However, existing methods primarily focus on the average response of a specific cell type to perturbation, overlooking the single-cell specificity of perturbation responses and a more comprehensive prediction of the entire perturbation response distribution.RESULTSHere we present scPRAM, a method for predicting Perturbation Responses in single-cell gene expression based on Attention Mechanisms. Leveraging variational autoencoders and optimal transport, scPRAM aligns cell states before and after perturbation, followed by accurate prediction of gene expression responses to perturbations for unseen cell types through attention mechanisms. Experiments on multiple real perturbation datasets involving drug treatments and bacterial infections demonstrate that scPRAM attains heightened accuracy in perturbation prediction across cell types, species, and individuals, surpassing existing methodologies. Furthermore, scPRAM demonstrates outstanding capability in identifying differentially expressed genes under perturbation, capturing heterogeneity in perturbation responses across species, and maintaining stability in the presence of data noise and sample size variations.AVAILABILITY AND IMPLEMENTATIONhttps://github.com/jiang-q19/scPRAM and https://doi.org/10.5281/zenodo.10935038.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.
动机随着单细胞测序技术的飞速发展,从基因表达水平深入研究细胞对各种外部扰动的反应逐渐成为可能。然而,在某些情况下获取扰动样本可能具有相当大的挑战性,而且测序相关的高昂成本也限制了大规模实验的可行性。目前已有一系列方法用于预测单细胞基因表达的扰动反应。然而,现有的方法主要关注特定细胞类型对扰动的平均响应,忽略了扰动响应的单细胞特异性以及对整个扰动响应分布的更全面预测。结果在此,我们提出了基于注意机制的单细胞基因表达扰动响应预测方法 scPRAM。利用变异自编码器和最优传输,scPRAM 对扰动前后的细胞状态进行了调整,然后通过注意机制准确预测了未见细胞类型的基因表达对扰动的反应。在涉及药物治疗和细菌感染的多个真实扰动数据集上进行的实验表明,scPRAM 在跨细胞类型、物种和个体的扰动预测方面达到了更高的准确性,超越了现有方法。此外,scPRAM 在识别扰动下的差异表达基因、捕捉不同物种扰动反应的异质性以及在数据噪声和样本量变化的情况下保持稳定性方面表现出了卓越的能力。AVAILABILITY AND IMPLEMENTATIONhttps://github.com/jiang-q19/scPRAM and https://doi.org/10.5281/zenodo.10935038.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.
{"title":"scPRAM accurately predicts single-cell gene expression perturbation response based on attention mechanism.","authors":"Qun Jiang, Shengquan Chen, Xiaoyang Chen, Rui Jiang","doi":"10.1093/bioinformatics/btae265","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae265","url":null,"abstract":"MOTIVATION\u0000With the rapid advancement of single-cell sequencing technology, it becomes gradually possible to delve into the cellular responses to various external perturbations at the gene expression level. However, obtaining perturbed samples in certain scenarios may be considerably challenging, and the substantial costs associated with sequencing also curtail the feasibility of large-scale experimentation. A repertoire of methodologies has been employed for forecasting perturbative responses in single-cell gene expression. However, existing methods primarily focus on the average response of a specific cell type to perturbation, overlooking the single-cell specificity of perturbation responses and a more comprehensive prediction of the entire perturbation response distribution.\u0000\u0000\u0000RESULTS\u0000Here we present scPRAM, a method for predicting Perturbation Responses in single-cell gene expression based on Attention Mechanisms. Leveraging variational autoencoders and optimal transport, scPRAM aligns cell states before and after perturbation, followed by accurate prediction of gene expression responses to perturbations for unseen cell types through attention mechanisms. Experiments on multiple real perturbation datasets involving drug treatments and bacterial infections demonstrate that scPRAM attains heightened accuracy in perturbation prediction across cell types, species, and individuals, surpassing existing methodologies. Furthermore, scPRAM demonstrates outstanding capability in identifying differentially expressed genes under perturbation, capturing heterogeneity in perturbation responses across species, and maintaining stability in the presence of data noise and sample size variations.\u0000\u0000\u0000AVAILABILITY AND IMPLEMENTATION\u0000https://github.com/jiang-q19/scPRAM and https://doi.org/10.5281/zenodo.10935038.\u0000\u0000\u0000SUPPLEMENTARY INFORMATION\u0000Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140701441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1