首页 > 最新文献

Briefings in bioinformatics最新文献

英文 中文
ToxGIN: an In silico prediction model for peptide toxicity via graph isomorphism networks integrating peptide sequence and structure information. ToxGIN:通过图同构网络整合多肽序列和结构信息的多肽毒性硅学预测模型。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae583
Qiule Yu, Zhixing Zhang, Guixia Liu, Weihua Li, Yun Tang

Peptide drugs have demonstrated enormous potential in treating a variety of diseases, yet toxicity prediction remains a significant challenge in drug development. Existing models for prediction of peptide toxicity largely rely on sequence information and often neglect the three-dimensional (3D) structures of peptides. This study introduced a novel model for short peptide toxicity prediction, named ToxGIN. The model utilizes Graph Isomorphism Network (GIN), integrating the underlying amino acid sequence composition and the 3D structures of peptides. ToxGIN comprises three primary modules: (i) Sequence processing module, converting peptide 3D structures and sequences into information of nodes and edges; (ii) Feature extraction module, utilizing GIN to learn discriminative features from nodes and edges; (iii) Classification module, employing a fully connected classifier for toxicity prediction. ToxGIN performed well on the independent test set with F1 score = 0.83, AUROC = 0.91, and Matthews correlation coefficient = 0.68, better than existing models for prediction of peptide toxicity. These results validated the effectiveness of integrating 3D structural information with sequence data using GIN for peptide toxicity prediction. The proposed ToxGIN and data can be freely accessible at https://github.com/cihebiyql/ToxGIN.

肽类药物在治疗各种疾病方面已显示出巨大的潜力,但毒性预测仍然是药物开发中的一项重大挑战。现有的多肽毒性预测模型主要依赖序列信息,往往忽略了多肽的三维(3D)结构。本研究引入了一种新的短肽毒性预测模型,命名为 ToxGIN。该模型利用图同构网络(GIN),整合了肽的基本氨基酸序列组成和三维结构。ToxGIN 包括三个主要模块:(i) 序列处理模块,将肽的三维结构和序列转换为节点和边的信息;(ii) 特征提取模块,利用 GIN 从节点和边中学习判别特征;(iii) 分类模块,采用全连接分类器进行毒性预测。ToxGIN 在独立测试集上表现良好,F1 分数 = 0.83,AUROC = 0.91,Matthews 相关系数 = 0.68,优于现有的多肽毒性预测模型。这些结果验证了利用 GIN 将三维结构信息与序列数据整合用于多肽毒性预测的有效性。建议的 ToxGIN 和数据可在 https://github.com/cihebiyql/ToxGIN 免费访问。
{"title":"ToxGIN: an In silico prediction model for peptide toxicity via graph isomorphism networks integrating peptide sequence and structure information.","authors":"Qiule Yu, Zhixing Zhang, Guixia Liu, Weihua Li, Yun Tang","doi":"10.1093/bib/bbae583","DOIUrl":"10.1093/bib/bbae583","url":null,"abstract":"<p><p>Peptide drugs have demonstrated enormous potential in treating a variety of diseases, yet toxicity prediction remains a significant challenge in drug development. Existing models for prediction of peptide toxicity largely rely on sequence information and often neglect the three-dimensional (3D) structures of peptides. This study introduced a novel model for short peptide toxicity prediction, named ToxGIN. The model utilizes Graph Isomorphism Network (GIN), integrating the underlying amino acid sequence composition and the 3D structures of peptides. ToxGIN comprises three primary modules: (i) Sequence processing module, converting peptide 3D structures and sequences into information of nodes and edges; (ii) Feature extraction module, utilizing GIN to learn discriminative features from nodes and edges; (iii) Classification module, employing a fully connected classifier for toxicity prediction. ToxGIN performed well on the independent test set with F1 score = 0.83, AUROC = 0.91, and Matthews correlation coefficient = 0.68, better than existing models for prediction of peptide toxicity. These results validated the effectiveness of integrating 3D structural information with sequence data using GIN for peptide toxicity prediction. The proposed ToxGIN and data can be freely accessible at https://github.com/cihebiyql/ToxGIN.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11555482/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142614950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scTCA: a hybrid Transformer-CNN architecture for imputation and denoising of scDNA-seq data. scTCA:用于scDNA-seq数据归因和去噪的混合变换器-CNN架构。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae577
Zhenhua Yu, Furui Liu, Yang Li

Single-cell DNA sequencing (scDNA-seq) has been widely used to unmask tumor copy number alterations (CNAs) at single-cell resolution. Despite that arm-level CNAs can be accurately detected from single-cell read counts, it is difficult to precisely identify focal CNAs as the read counts are featured with high dimensionality, high sparsity and low signal-to-noise ratio. This gives rise to a desperate demand for reconstructing high-quality scDNA-seq data. We develop a new method called scTCA for imputation and denoising of single-cell read counts, thus aiding in downstream analysis of both arm-level and focal CNAs. scTCA employs hybrid Transformer-CNN architectures to identify local and non-local correlations between genes for precise recovery of the read counts. Unlike conventional Transformers, the Transformer block in scTCA is a two-stage attention module containing a stepwise self-attention layer and a window Transformer, and can efficiently deal with the high-dimensional read counts data. We showcase the superior performance of scTCA through comparison with the state-of-the-arts on both synthetic and real datasets. The results indicate it is highly effective in imputation and denoising of scDNA-seq data.

单细胞DNA测序(scDNA-seq)已被广泛用于以单细胞分辨率揭示肿瘤拷贝数改变(CNA)。尽管可以从单细胞读数中准确检测出臂级 CNA,但由于读数具有高维度、高稀疏性和低信噪比的特点,因此很难精确识别病灶 CNA。这就迫切需要重建高质量的 scDNA-seq 数据。我们开发了一种名为 scTCA 的新方法,用于单细胞读数的归因和去噪,从而帮助臂级和病灶 CNA 的下游分析。scTCA 采用混合 Transformer-CNN 架构来识别基因间的局部和非局部相关性,从而精确恢复读数。与传统的变换器不同,scTCA 中的变换器模块是一个两级注意模块,包含一个逐步自注意层和一个窗口变换器,可以高效处理高维读数数据。我们通过在合成数据集和真实数据集上与先进技术的比较,展示了 scTCA 的卓越性能。结果表明,它在 scDNA-seq 数据的归因和去噪方面非常有效。
{"title":"scTCA: a hybrid Transformer-CNN architecture for imputation and denoising of scDNA-seq data.","authors":"Zhenhua Yu, Furui Liu, Yang Li","doi":"10.1093/bib/bbae577","DOIUrl":"10.1093/bib/bbae577","url":null,"abstract":"<p><p>Single-cell DNA sequencing (scDNA-seq) has been widely used to unmask tumor copy number alterations (CNAs) at single-cell resolution. Despite that arm-level CNAs can be accurately detected from single-cell read counts, it is difficult to precisely identify focal CNAs as the read counts are featured with high dimensionality, high sparsity and low signal-to-noise ratio. This gives rise to a desperate demand for reconstructing high-quality scDNA-seq data. We develop a new method called scTCA for imputation and denoising of single-cell read counts, thus aiding in downstream analysis of both arm-level and focal CNAs. scTCA employs hybrid Transformer-CNN architectures to identify local and non-local correlations between genes for precise recovery of the read counts. Unlike conventional Transformers, the Transformer block in scTCA is a two-stage attention module containing a stepwise self-attention layer and a window Transformer, and can efficiently deal with the high-dimensional read counts data. We showcase the superior performance of scTCA through comparison with the state-of-the-arts on both synthetic and real datasets. The results indicate it is highly effective in imputation and denoising of scDNA-seq data.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11551055/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142614940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
mbDriver: identifying driver microbes in microbial communities based on time-series microbiome data. mbDriver:根据时间序列微生物组数据识别微生物群落中的驱动微生物。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae580
Xiaoxiu Tan, Feng Xue, Chenhong Zhang, Tao Wang

Alterations in human microbial communities are intricately linked to the onset and progression of diseases. Identifying the key microbes driving these community changes is crucial, as they may serve as valuable biomarkers for disease prevention, diagnosis, and treatment. However, there remains a need for further research to develop effective methods for addressing this critical task. This is primarily because defining the driver microbe requires consideration not only of each microbe's individual contributions but also their interactions. This paper introduces a novel framework, called mbDriver, for identifying driver microbes based on microbiome abundance data collected at discrete time points. mbDriver comprises three main components: (i) data preprocessing of time-series abundance data using smoothing splines based on the negative binomial distribution, (ii) parameter estimation for the generalized Lotka-Volterra (gLV) model using regularized least squares, and (iii) quantification of each microbe's contribution to the community's steady state by manipulating the causal graph implied by gLV equations. The performance of nonparametric spline-based denoising and regularized least squares estimation is comprehensively evaluated on simulated datasets, demonstrating superiority over existing methods. Furthermore, the practical applicability and effectiveness of mbDriver are showcased using a dietary fiber intervention dataset and an ulcerative colitis dataset. Notably, driver microbes identified in the dietary fiber intervention dataset exhibit significant effects on the abundances of short-chain fatty acids, while those identified in the ulcerative colitis dataset show a significant correlation with metabolism-related pathways.

人类微生物群落的变化与疾病的发生和发展密切相关。识别驱动这些群落变化的关键微生物至关重要,因为它们可以作为疾病预防、诊断和治疗的重要生物标志物。然而,要开发出解决这一关键任务的有效方法,仍然需要进一步的研究。这主要是因为确定驱动微生物不仅需要考虑每种微生物的个体贡献,还需要考虑它们之间的相互作用。本文介绍了一种名为 mbDriver 的新型框架,用于根据在离散时间点收集的微生物组丰度数据确定驱动微生物:(i) 使用基于负二项分布的平滑样条对时间序列丰度数据进行数据预处理;(ii) 使用正则化最小二乘法对广义洛特卡-伏特拉(gLV)模型进行参数估计;(iii) 通过操纵 gLV 方程隐含的因果图,量化每种微生物对群落稳态的贡献。在模拟数据集上对基于非参数样条线的去噪和正则化最小二乘法估计的性能进行了全面评估,证明其优于现有方法。此外,利用膳食纤维干预数据集和溃疡性结肠炎数据集展示了 mbDriver 的实际应用性和有效性。值得注意的是,在膳食纤维干预数据集中发现的驱动微生物对短链脂肪酸的丰度有显著影响,而在溃疡性结肠炎数据集中发现的驱动微生物则与代谢相关途径有显著相关性。
{"title":"mbDriver: identifying driver microbes in microbial communities based on time-series microbiome data.","authors":"Xiaoxiu Tan, Feng Xue, Chenhong Zhang, Tao Wang","doi":"10.1093/bib/bbae580","DOIUrl":"10.1093/bib/bbae580","url":null,"abstract":"<p><p>Alterations in human microbial communities are intricately linked to the onset and progression of diseases. Identifying the key microbes driving these community changes is crucial, as they may serve as valuable biomarkers for disease prevention, diagnosis, and treatment. However, there remains a need for further research to develop effective methods for addressing this critical task. This is primarily because defining the driver microbe requires consideration not only of each microbe's individual contributions but also their interactions. This paper introduces a novel framework, called mbDriver, for identifying driver microbes based on microbiome abundance data collected at discrete time points. mbDriver comprises three main components: (i) data preprocessing of time-series abundance data using smoothing splines based on the negative binomial distribution, (ii) parameter estimation for the generalized Lotka-Volterra (gLV) model using regularized least squares, and (iii) quantification of each microbe's contribution to the community's steady state by manipulating the causal graph implied by gLV equations. The performance of nonparametric spline-based denoising and regularized least squares estimation is comprehensively evaluated on simulated datasets, demonstrating superiority over existing methods. Furthermore, the practical applicability and effectiveness of mbDriver are showcased using a dietary fiber intervention dataset and an ulcerative colitis dataset. Notably, driver microbes identified in the dietary fiber intervention dataset exhibit significant effects on the abundances of short-chain fatty acids, while those identified in the ulcerative colitis dataset show a significant correlation with metabolism-related pathways.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11551971/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142614863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deciphering lineage-relevant gene regulatory networks during endoderm formation by InPheRNo-ChIP. 通过 InPheRNo-ChIP 解密内胚层形成过程中与品系相关的基因调控网络。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae592
Chen Su, William A Pastor, Amin Emad

Deciphering the underlying gene regulatory networks (GRNs) that govern early human embryogenesis is critical for understanding developmental mechanisms yet remains challenging due to limited sample availability and the inherent complexity of the biological processes involved. To address this, we developed InPheRNo-ChIP, a computational framework that integrates multimodal data, including RNA-seq, transcription factor (TF)-specific ChIP-seq, and phenotypic labels, to reconstruct phenotype-relevant GRNs associated with endoderm development. The core of this method is a probabilistic graphical model that models the simultaneous effect of TFs on their putative target genes to influence a particular phenotypic outcome. Unlike the majority of existing GRN inference methods that are agnostic to the phenotypic outcomes, InPheRNo-ChIP directly incorporates phenotypic information during GRN inference, enabling the distinction between lineage-specific and general regulatory interactions. We integrated data from three experimental studies and applied InPheRNo-ChIP to infer the GRN governing the differentiation of human embryonic stem cells into definitive endoderm. Benchmarking against a scRNA-seq CRISPRi study demonstrated InPheRNo-ChIP's ability to identify regulatory interactions involving endoderm markers FOXA2, SMAD2, and SOX17, outperforming other methods. This highlights the importance of incorporating the phenotypic context during network inference. Furthermore, an ablation study confirms the synergistic contribution of ChIP-seq, RNA-seq, and phenotypic data, highlighting the value of multimodal integration for accurate phenotype-relevant GRN reconstruction.

破译支配人类早期胚胎发生的潜在基因调控网络(GRN)对于了解发育机制至关重要,但由于样本可用性有限以及所涉及的生物过程固有的复杂性,这一工作仍具有挑战性。为了解决这个问题,我们开发了 InPheRNo-ChIP,这是一个计算框架,它整合了多模态数据,包括 RNA-seq、转录因子(TF)特异性 ChIP-seq 和表型标签,以重建与内胚层发育相关的表型相关 GRN。该方法的核心是一个概率图形模型,它模拟了转录因子对其推定靶基因的同步作用,从而影响特定的表型结果。与大多数与表型结果无关的现有 GRN 推断方法不同,InPheRNo-ChIP 在 GRN 推断过程中直接纳入了表型信息,从而区分了特异性和一般性调控相互作用。我们整合了三项实验研究的数据,并应用 InPheRNo-ChIP 推断了人类胚胎干细胞向明确内胚层分化的 GRN。以 scRNA-seq CRISPRi 研究为基准,证明 InPheRNo-ChIP 有能力识别涉及内胚层标志物 FOXA2、SMAD2 和 SOX17 的调控相互作用,表现优于其他方法。这凸显了在网络推断过程中结合表型背景的重要性。此外,一项消融研究证实了 ChIP-seq、RNA-seq 和表型数据的协同作用,凸显了多模态整合对于准确重建表型相关的 GRN 的价值。
{"title":"Deciphering lineage-relevant gene regulatory networks during endoderm formation by InPheRNo-ChIP.","authors":"Chen Su, William A Pastor, Amin Emad","doi":"10.1093/bib/bbae592","DOIUrl":"10.1093/bib/bbae592","url":null,"abstract":"<p><p>Deciphering the underlying gene regulatory networks (GRNs) that govern early human embryogenesis is critical for understanding developmental mechanisms yet remains challenging due to limited sample availability and the inherent complexity of the biological processes involved. To address this, we developed InPheRNo-ChIP, a computational framework that integrates multimodal data, including RNA-seq, transcription factor (TF)-specific ChIP-seq, and phenotypic labels, to reconstruct phenotype-relevant GRNs associated with endoderm development. The core of this method is a probabilistic graphical model that models the simultaneous effect of TFs on their putative target genes to influence a particular phenotypic outcome. Unlike the majority of existing GRN inference methods that are agnostic to the phenotypic outcomes, InPheRNo-ChIP directly incorporates phenotypic information during GRN inference, enabling the distinction between lineage-specific and general regulatory interactions. We integrated data from three experimental studies and applied InPheRNo-ChIP to infer the GRN governing the differentiation of human embryonic stem cells into definitive endoderm. Benchmarking against a scRNA-seq CRISPRi study demonstrated InPheRNo-ChIP's ability to identify regulatory interactions involving endoderm markers FOXA2, SMAD2, and SOX17, outperforming other methods. This highlights the importance of incorporating the phenotypic context during network inference. Furthermore, an ablation study confirms the synergistic contribution of ChIP-seq, RNA-seq, and phenotypic data, highlighting the value of multimodal integration for accurate phenotype-relevant GRN reconstruction.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11558691/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142614891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3t-seq: automatic gene expression analysis of single-copy genes, transposable elements, and tRNAs from RNA-seq data. 3t-seq:从 RNA-seq 数据中自动分析单拷贝基因、转座元件和 tRNA 的基因表达。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae467
Francesco Tabaro, Matthieu Boulard

RNA sequencing is the gold-standard method to quantify transcriptomic changes between two conditions. The overwhelming majority of data analysis methods available are focused on polyadenylated RNA transcribed from single-copy genes and overlook transcripts from repeated sequences such as transposable elements (TEs). These self-autonomous genetic elements are increasingly studied, and specialized tools designed to handle multimapping sequencing reads are available. Transfer RNAs are transcribed by RNA polymerase III and are essential for protein translation. There is a need for integrated software that is able to analyze multiple types of RNA. Here, we present 3t-seq, a Snakemake pipeline for integrated differential expression analysis of transcripts from single-copy genes, TEs, and tRNA. 3t-seq produces an accessible report and easy-to-use results for downstream analysis starting from raw sequencing data and performing quality control, genome mapping, gene expression quantification, and statistical testing. It implements three methods to quantify TEs expression and one for tRNA genes. It provides an easy-to-configure method to manage software dependencies that lets the user focus on results. 3t-seq is released under MIT license and is available at https://github.com/boulardlab/3t-seq.

RNA 测序是量化两种情况下转录组变化的黄金标准方法。现有的绝大多数数据分析方法都侧重于从单拷贝基因转录的多聚腺苷酸 RNA,而忽略了从重复序列(如转座元素(TE))转录的转录本。对这些自发遗传元件的研究越来越多,也出现了专门处理多映射测序读数的工具。转运 RNA 由 RNA 聚合酶 III 转录,对蛋白质翻译至关重要。我们需要能够分析多种类型 RNA 的集成软件。在这里,我们介绍 3t-seq,这是一个 Snakemake 管道,用于对来自单拷贝基因、TE 和 tRNA 的转录本进行综合差异表达分析。3t-seq 可生成可访问的报告和易于使用的结果,以便从原始测序数据开始进行下游分析,并执行质量控制、基因组图谱、基因表达量化和统计测试。它采用三种方法量化 TEs 表达,一种方法量化 tRNA 基因。它提供了一种易于配置的方法来管理软件依赖性,让用户专注于结果。3t-seq 在 MIT 许可下发布,可从 https://github.com/boulardlab/3t-seq 获取。
{"title":"3t-seq: automatic gene expression analysis of single-copy genes, transposable elements, and tRNAs from RNA-seq data.","authors":"Francesco Tabaro, Matthieu Boulard","doi":"10.1093/bib/bbae467","DOIUrl":"https://doi.org/10.1093/bib/bbae467","url":null,"abstract":"<p><p>RNA sequencing is the gold-standard method to quantify transcriptomic changes between two conditions. The overwhelming majority of data analysis methods available are focused on polyadenylated RNA transcribed from single-copy genes and overlook transcripts from repeated sequences such as transposable elements (TEs). These self-autonomous genetic elements are increasingly studied, and specialized tools designed to handle multimapping sequencing reads are available. Transfer RNAs are transcribed by RNA polymerase III and are essential for protein translation. There is a need for integrated software that is able to analyze multiple types of RNA. Here, we present 3t-seq, a Snakemake pipeline for integrated differential expression analysis of transcripts from single-copy genes, TEs, and tRNA. 3t-seq produces an accessible report and easy-to-use results for downstream analysis starting from raw sequencing data and performing quality control, genome mapping, gene expression quantification, and statistical testing. It implements three methods to quantify TEs expression and one for tRNA genes. It provides an easy-to-configure method to manage software dependencies that lets the user focus on results. 3t-seq is released under MIT license and is available at https://github.com/boulardlab/3t-seq.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11424182/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142341931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CMFHMDA: a prediction framework for human disease-microbe associations based on cross-domain matrix factorization. CMFHMDA:基于跨域矩阵因式分解的人类疾病-微生物关联预测框架。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae481
Jing Chen, Ran Tao, Yi Qiu, Qun Yuan

Predicting associations between microbes and diseases opens up new avenues for developing diagnostic, preventive, and therapeutic strategies. Given that laboratory-based biological tests to verify these associations are often time-consuming and expensive, there is a critical need for innovative computational frameworks to predict new microbe-disease associations. In this work, we introduce a novel prediction algorithm called Predicting Human Disease-Microbe Associations using Cross-Domain Matrix Factorization (CMFHMDA). Initially, we calculate the composite similarity of diseases and the Gaussian interaction profile similarity of microbes. We then apply the Weighted K Nearest Known Neighbors (WKNKN) algorithm to refine the microbe-disease association matrix. Our CMFHMDA model is subsequently developed by integrating the network data of both microbes and diseases to predict potential associations. The key innovations of this method include using the WKNKN algorithm to preprocess missing values in the association matrix and incorporating cross-domain information from microbes and diseases into the CMFHMDA model. To validate CMFHMDA, we employed three different cross-validation techniques to evaluate the model's accuracy. The results indicate that the CMFHMDA model achieved Area Under the Receiver Operating Characteristic Curve scores of 0.9172, 0.8551, and 0.9351$pm $0.0052 in global Leave-One-Out Cross-Validation (LOOCV), local LOOCV, and five-fold CV, respectively. Furthermore, many predicted associations have been confirmed by published experimental studies, establishing CMFHMDA as an effective tool for predicting potential disease-associated microbes.

预测微生物与疾病之间的关联为开发诊断、预防和治疗策略开辟了新途径。鉴于验证这些关联的实验室生物测试往往耗时且昂贵,因此亟需创新的计算框架来预测新的微生物与疾病的关联。在这项工作中,我们介绍了一种名为 "利用跨域矩阵因式分解预测人类疾病-微生物关联"(CMFHMDA)的新型预测算法。首先,我们计算疾病的复合相似度和微生物的高斯交互轮廓相似度。然后,我们采用加权 K 最近已知邻居(WKNKN)算法来完善微生物-疾病关联矩阵。随后,通过整合微生物和疾病的网络数据,我们建立了 CMFHMDA 模型,以预测潜在的关联。该方法的主要创新点包括使用 WKNKN 算法预处理关联矩阵中的缺失值,以及将微生物和疾病的跨领域信息纳入 CMFHMDA 模型。为了验证 CMFHMDA,我们采用了三种不同的交叉验证技术来评估模型的准确性。结果表明,CMFHMDA模型在全局留空交叉验证(LOOCV)、局部留空交叉验证(LOOCV)和五倍交叉验证(5-fold CV)中的接收者工作特征曲线下面积(Area Under the Receiver Operating Characteristic Curve)得分分别为0.9172、0.8551和0.9351/pm $0.0052。此外,许多预测的关联已被已发表的实验研究证实,从而使 CMFHMDA 成为预测潜在疾病相关微生物的有效工具。
{"title":"CMFHMDA: a prediction framework for human disease-microbe associations based on cross-domain matrix factorization.","authors":"Jing Chen, Ran Tao, Yi Qiu, Qun Yuan","doi":"10.1093/bib/bbae481","DOIUrl":"https://doi.org/10.1093/bib/bbae481","url":null,"abstract":"<p><p>Predicting associations between microbes and diseases opens up new avenues for developing diagnostic, preventive, and therapeutic strategies. Given that laboratory-based biological tests to verify these associations are often time-consuming and expensive, there is a critical need for innovative computational frameworks to predict new microbe-disease associations. In this work, we introduce a novel prediction algorithm called Predicting Human Disease-Microbe Associations using Cross-Domain Matrix Factorization (CMFHMDA). Initially, we calculate the composite similarity of diseases and the Gaussian interaction profile similarity of microbes. We then apply the Weighted K Nearest Known Neighbors (WKNKN) algorithm to refine the microbe-disease association matrix. Our CMFHMDA model is subsequently developed by integrating the network data of both microbes and diseases to predict potential associations. The key innovations of this method include using the WKNKN algorithm to preprocess missing values in the association matrix and incorporating cross-domain information from microbes and diseases into the CMFHMDA model. To validate CMFHMDA, we employed three different cross-validation techniques to evaluate the model's accuracy. The results indicate that the CMFHMDA model achieved Area Under the Receiver Operating Characteristic Curve scores of 0.9172, 0.8551, and 0.9351$pm $0.0052 in global Leave-One-Out Cross-Validation (LOOCV), local LOOCV, and five-fold CV, respectively. Furthermore, many predicted associations have been confirmed by published experimental studies, establishing CMFHMDA as an effective tool for predicting potential disease-associated microbes.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11427075/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142341938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepPBI-KG: a deep learning method for the prediction of phage-bacteria interactions based on key genes. DeepPBI-KG:基于关键基因预测噬菌体-细菌相互作用的深度学习方法。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae484
Tongqing Wei, Chenqi Lu, Hanxiao Du, Qianru Yang, Xin Qi, Yankun Liu, Yi Zhang, Chen Chen, Yutong Li, Yuanhao Tang, Wen-Hong Zhang, Xu Tao, Ning Jiang

Phages, the natural predators of bacteria, were discovered more than 100 years ago. However, increasing antimicrobial resistance rates have revitalized phage research. Methods that are more time-consuming and efficient than wet-laboratory experiments are needed to help screen phages quickly for therapeutic use. Traditional computational methods usually ignore the fact that phage-bacteria interactions are achieved by key genes and proteins. Methods for intraspecific prediction are rare since almost all existing methods consider only interactions at the species and genus levels. Moreover, most strains in existing databases contain only partial genome information because whole-genome information for species is difficult to obtain. Here, we propose a new approach for interaction prediction by constructing new features from key genes and proteins via the application of K-means sampling to select high-quality negative samples for prediction. Finally, we develop DeepPBI-KG, a corresponding prediction tool based on feature selection and a deep neural network. The results show that the average area under the curve for prediction reached 0.93 for each strain, and the overall AUC and area under the precision-recall curve reached 0.89 and 0.92, respectively, on the independent test set; these values are greater than those of other existing prediction tools. The forward and reverse validation results indicate that key genes and key proteins regulate and influence the interaction, which supports the reliability of the model. In addition, intraspecific prediction experiments based on Klebsiella pneumoniae data demonstrate the potential applicability of DeepPBI-KG for intraspecific prediction. In summary, the feature engineering and interaction prediction approaches proposed in this study can effectively improve the robustness and stability of interaction prediction, can achieve high generalizability, and may provide new directions and insights for rapid phage screening for therapy.

噬菌体是细菌的天敌,早在 100 多年前就已被发现。然而,抗菌药耐药性的增加为噬菌体研究注入了新的活力。我们需要比湿实验室实验更耗时、更高效的方法来帮助快速筛选噬菌体,以便用于治疗。传统的计算方法通常忽略了噬菌体与细菌之间的相互作用是由关键基因和蛋白质实现的这一事实。由于几乎所有现有方法都只考虑种和属一级的相互作用,因此用于种内预测的方法并不多见。此外,现有数据库中的大多数菌株只包含部分基因组信息,因为物种的全基因组信息很难获得。在此,我们提出了一种新的相互作用预测方法,通过应用 K-means 抽样从关键基因和蛋白质中构建新特征,从而选择高质量的阴性样本进行预测。最后,我们开发了基于特征选择和深度神经网络的相应预测工具 DeepPBI-KG。结果表明,在独立测试集上,每个菌株的平均预测曲线下面积达到了 0.93,总体 AUC 和精度-召回曲线下面积分别达到了 0.89 和 0.92;这些值都高于其他现有预测工具。正向和反向验证结果表明,关键基因和关键蛋白调控和影响着相互作用,这支持了模型的可靠性。此外,基于肺炎克雷伯菌数据的种内预测实验证明了 DeepPBI-KG 在种内预测方面的潜在适用性。总之,本研究提出的特征工程和相互作用预测方法能有效提高相互作用预测的鲁棒性和稳定性,并能实现较高的普适性,可为噬菌体快速筛选治疗提供新的方向和见解。
{"title":"DeepPBI-KG: a deep learning method for the prediction of phage-bacteria interactions based on key genes.","authors":"Tongqing Wei, Chenqi Lu, Hanxiao Du, Qianru Yang, Xin Qi, Yankun Liu, Yi Zhang, Chen Chen, Yutong Li, Yuanhao Tang, Wen-Hong Zhang, Xu Tao, Ning Jiang","doi":"10.1093/bib/bbae484","DOIUrl":"10.1093/bib/bbae484","url":null,"abstract":"<p><p>Phages, the natural predators of bacteria, were discovered more than 100 years ago. However, increasing antimicrobial resistance rates have revitalized phage research. Methods that are more time-consuming and efficient than wet-laboratory experiments are needed to help screen phages quickly for therapeutic use. Traditional computational methods usually ignore the fact that phage-bacteria interactions are achieved by key genes and proteins. Methods for intraspecific prediction are rare since almost all existing methods consider only interactions at the species and genus levels. Moreover, most strains in existing databases contain only partial genome information because whole-genome information for species is difficult to obtain. Here, we propose a new approach for interaction prediction by constructing new features from key genes and proteins via the application of K-means sampling to select high-quality negative samples for prediction. Finally, we develop DeepPBI-KG, a corresponding prediction tool based on feature selection and a deep neural network. The results show that the average area under the curve for prediction reached 0.93 for each strain, and the overall AUC and area under the precision-recall curve reached 0.89 and 0.92, respectively, on the independent test set; these values are greater than those of other existing prediction tools. The forward and reverse validation results indicate that key genes and key proteins regulate and influence the interaction, which supports the reliability of the model. In addition, intraspecific prediction experiments based on Klebsiella pneumoniae data demonstrate the potential applicability of DeepPBI-KG for intraspecific prediction. In summary, the feature engineering and interaction prediction approaches proposed in this study can effectively improve the robustness and stability of interaction prediction, can achieve high generalizability, and may provide new directions and insights for rapid phage screening for therapy.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11440089/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142341940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing RNA-seq analysis by addressing all co-existing biases using a self-benchmarking approach with 2D structural insights. 利用具有二维结构洞察力的自我基准方法解决所有并存的偏差,从而加强 RNA-seq 分析。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae532
Qiang Su, Yi Long, Deming Gou, Junmin Quan, Qizhou Lian

We introduce a groundbreaking approach: the minimum free energy-based Gaussian Self-Benchmarking (MFE-GSB) framework, designed to combat the myriad of biases inherent in RNA-seq data. Central to our methodology is the MFE concept, facilitating the adoption of a Gaussian distribution model tailored to effectively mitigate all co-existing biases within a k-mer counting scheme. The MFE-GSB framework operates on a sophisticated dual-model system, juxtaposing modeling data of uniform k-mer distribution against the real, observed sequencing data characterized by nonuniform k-mer distributions. The framework applies a Gaussian function, guided by the predetermined parameters-mean and SD-derived from modeling data, to fit unknown sequencing data. This dual comparison allows for the accurate prediction of k-mer abundances across MFE categories, enabling simultaneous correction of biases at the single k-mer level. Through validation with both engineered RNA constructs and human tissue RNA samples, its wide-ranging efficacy and applicability are demonstrated.

我们引入了一种开创性的方法:基于最小自由能的高斯自基准(MFE-GSB)框架,旨在消除 RNA-seq 数据中固有的无数偏差。我们方法的核心是 MFE 概念,它有助于采用高斯分布模型,以有效减轻 k-mer 计数方案中所有并存的偏差。MFE-GSB 框架在复杂的双模型系统上运行,将统一 k-mer 分布的建模数据与以非统一 k-mer 分布为特征的真实观察测序数据并列。该框架应用高斯函数,在从建模数据中提取的预定参数--均值和标度--的指导下,拟合未知测序数据。通过这种双重比较,可以准确预测不同 MFE 类别的 k-mer丰度,同时纠正单个 k-mer水平的偏差。通过对工程 RNA 构建和人体组织 RNA 样本的验证,证明了该方法的广泛功效和适用性。
{"title":"Enhancing RNA-seq analysis by addressing all co-existing biases using a self-benchmarking approach with 2D structural insights.","authors":"Qiang Su, Yi Long, Deming Gou, Junmin Quan, Qizhou Lian","doi":"10.1093/bib/bbae532","DOIUrl":"10.1093/bib/bbae532","url":null,"abstract":"<p><p>We introduce a groundbreaking approach: the minimum free energy-based Gaussian Self-Benchmarking (MFE-GSB) framework, designed to combat the myriad of biases inherent in RNA-seq data. Central to our methodology is the MFE concept, facilitating the adoption of a Gaussian distribution model tailored to effectively mitigate all co-existing biases within a k-mer counting scheme. The MFE-GSB framework operates on a sophisticated dual-model system, juxtaposing modeling data of uniform k-mer distribution against the real, observed sequencing data characterized by nonuniform k-mer distributions. The framework applies a Gaussian function, guided by the predetermined parameters-mean and SD-derived from modeling data, to fit unknown sequencing data. This dual comparison allows for the accurate prediction of k-mer abundances across MFE categories, enabling simultaneous correction of biases at the single k-mer level. Through validation with both engineered RNA constructs and human tissue RNA samples, its wide-ranging efficacy and applicability are demonstrated.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11491153/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142458372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SpaDiT: diffusion transformer for spatial gene expression prediction using scRNA-seq. SpaDiT:利用 scRNA-seq 进行空间基因表达预测的扩散变换器。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae571
Xiaoyu Li, Fangfang Zhu, Wenwen Min

The rapid development of spatially resolved transcriptomics (SRT) technologies has provided unprecedented opportunities for exploring the structure of specific organs or tissues. However, these techniques (such as image-based SRT) can achieve single-cell resolution, but can only capture the expression levels of tens to hundreds of genes. Such spatial transcriptomics (ST) data, carrying a large number of undetected genes, have limited its application value. To address the challenge, we develop SpaDiT, a deep learning framework for spatial reconstruction and gene expression prediction using scRNA-seq data. SpaDiT employs scRNA-seq data as an a priori condition and utilizes shared genes between ST and scRNA-seq data as latent representations to construct inputs, thereby facilitating the accurate prediction of gene expression in ST data. SpaDiT enhances the accuracy of spatial gene expression predictions over a variety of spatial transcriptomics datasets. We have demonstrated the effectiveness of SpaDiT by conducting extensive experiments on both seq-based and image-based ST data. We compared SpaDiT with eight highly effective baseline methods and found that our proposed method achieved an 8%-12% improvement in performance across multiple metrics. Source code and all datasets used in this paper are available at https://github.com/wenwenmin/SpaDiT and https://zenodo.org/records/12792074.

空间分辨转录组学(SRT)技术的快速发展为探索特定器官或组织的结构提供了前所未有的机会。然而,这些技术(如基于图像的 SRT)可以达到单细胞分辨率,但只能捕捉几十到几百个基因的表达水平。这种空间转录组学(ST)数据携带大量未检测到的基因,限制了其应用价值。为了应对这一挑战,我们开发了一种利用 scRNA-seq 数据进行空间重建和基因表达预测的深度学习框架 SpaDiT。SpaDiT 将 scRNA-seq 数据作为先验条件,利用 ST 和 scRNA-seq 数据之间的共享基因作为潜在表征来构建输入,从而促进 ST 数据中基因表达的准确预测。SpaDiT 提高了对各种空间转录组学数据集进行空间基因表达预测的准确性。我们在基于序列和图像的 ST 数据上进行了大量实验,证明了 SpaDiT 的有效性。我们将 SpaDiT 与八种高效的基线方法进行了比较,发现我们提出的方法在多个指标上的性能提高了 8%-12%。本文使用的源代码和所有数据集可在 https://github.com/wenwenmin/SpaDiT 和 https://zenodo.org/records/12792074 上获取。
{"title":"SpaDiT: diffusion transformer for spatial gene expression prediction using scRNA-seq.","authors":"Xiaoyu Li, Fangfang Zhu, Wenwen Min","doi":"10.1093/bib/bbae571","DOIUrl":"10.1093/bib/bbae571","url":null,"abstract":"<p><p>The rapid development of spatially resolved transcriptomics (SRT) technologies has provided unprecedented opportunities for exploring the structure of specific organs or tissues. However, these techniques (such as image-based SRT) can achieve single-cell resolution, but can only capture the expression levels of tens to hundreds of genes. Such spatial transcriptomics (ST) data, carrying a large number of undetected genes, have limited its application value. To address the challenge, we develop SpaDiT, a deep learning framework for spatial reconstruction and gene expression prediction using scRNA-seq data. SpaDiT employs scRNA-seq data as an a priori condition and utilizes shared genes between ST and scRNA-seq data as latent representations to construct inputs, thereby facilitating the accurate prediction of gene expression in ST data. SpaDiT enhances the accuracy of spatial gene expression predictions over a variety of spatial transcriptomics datasets. We have demonstrated the effectiveness of SpaDiT by conducting extensive experiments on both seq-based and image-based ST data. We compared SpaDiT with eight highly effective baseline methods and found that our proposed method achieved an 8%-12% improvement in performance across multiple metrics. Source code and all datasets used in this paper are available at https://github.com/wenwenmin/SpaDiT and https://zenodo.org/records/12792074.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11541600/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142603263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MPA-MutPred: a novel strategy for accurately predicting the binding affinity change upon mutation in membrane protein complexes. MPA-MutPred:准确预测膜蛋白复合物突变时结合亲和力变化的新策略。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae598
Fathima Ridha, M Michael Gromiha

Mutations in the interface of membrane protein (MP) complexes are key contributors to a broad spectrum of human diseases, primarily due to changes in their binding affinities. While various methods exist for predicting the mutation-induced changes in binding affinity (ΔΔG) in protein-protein complexes, none are specific to MP complexes. This study proposes a novel strategy for ΔΔG prediction in MP complexes, which combines linear and nonlinear models, to obtain a more robust model with improved prediction accuracy. We used multiple linear regression to extract informative features that influence the binding affinity in MP complexes, which included changes in the stability of the complex, conservation score, electrostatic interaction, relatively accessible surface area, and interface contacts. Further, using gradient boosting regressor on the selected features, we developed MPA-MutPred, a novel method specific for predicting the ΔΔG of membrane protein-protein complexes, and it is freely accessible at https://web.iitm.ac.in/bioinfo2/MPA-MutPred/. Our method achieved a correlation of 0.75 and a mean absolute error (MAE) of 0.73 kcal/mol in the jack-knife test conducted on a dataset of 770 mutants. We further validated the method using a blind test set of 86 mutations, obtaining a correlation of 0.85 and an MAE of 0.77 kcal/mol. We anticipate that this method can be used for large-scale studies to understand the influence of binding affinity change on disease-causing mutations in MP complexes, thereby aiding in the understanding of disease mechanisms and the identification of potential therapeutic targets.

膜蛋白(MP)复合物界面的突变是导致多种人类疾病的主要原因,这主要是由于它们的结合亲和力发生了变化。虽然有多种方法可以预测突变引起的蛋白质-蛋白质复合物结合亲和力(ΔΔG)的变化,但没有一种方法是专门针对 MP 复合物的。本研究提出了一种预测 MP 复合物中ΔΔG 的新策略,该策略结合了线性和非线性模型,从而获得了一种更稳健、预测精度更高的模型。我们使用多元线性回归提取了影响 MP 复合物结合亲和力的信息特征,其中包括复合物稳定性的变化、守恒得分、静电作用、相对可及的表面积和界面接触。此外,我们还利用梯度提升回归器对所选特征进行梯度提升,开发出了专门用于预测膜蛋白-蛋白复合物ΔΔG的新方法MPA-MutPred,该方法可在https://web.iitm.ac.in/bioinfo2/MPA-MutPred/。我们的方法在对 770 个突变体数据集进行杰克刀测试时,相关性达到 0.75,平均绝对误差 (MAE) 为 0.73 kcal/mol。我们使用由 86 个突变体组成的盲测试集进一步验证了该方法,相关性为 0.85,平均绝对误差为 0.77 kcal/mol。我们预计这种方法可用于大规模研究,以了解结合亲和力变化对 MP 复合物中致病突变的影响,从而帮助理解疾病机制和确定潜在的治疗靶点。
{"title":"MPA-MutPred: a novel strategy for accurately predicting the binding affinity change upon mutation in membrane protein complexes.","authors":"Fathima Ridha, M Michael Gromiha","doi":"10.1093/bib/bbae598","DOIUrl":"10.1093/bib/bbae598","url":null,"abstract":"<p><p>Mutations in the interface of membrane protein (MP) complexes are key contributors to a broad spectrum of human diseases, primarily due to changes in their binding affinities. While various methods exist for predicting the mutation-induced changes in binding affinity (ΔΔG) in protein-protein complexes, none are specific to MP complexes. This study proposes a novel strategy for ΔΔG prediction in MP complexes, which combines linear and nonlinear models, to obtain a more robust model with improved prediction accuracy. We used multiple linear regression to extract informative features that influence the binding affinity in MP complexes, which included changes in the stability of the complex, conservation score, electrostatic interaction, relatively accessible surface area, and interface contacts. Further, using gradient boosting regressor on the selected features, we developed MPA-MutPred, a novel method specific for predicting the ΔΔG of membrane protein-protein complexes, and it is freely accessible at https://web.iitm.ac.in/bioinfo2/MPA-MutPred/. Our method achieved a correlation of 0.75 and a mean absolute error (MAE) of 0.73 kcal/mol in the jack-knife test conducted on a dataset of 770 mutants. We further validated the method using a blind test set of 86 mutations, obtaining a correlation of 0.85 and an MAE of 0.77 kcal/mol. We anticipate that this method can be used for large-scale studies to understand the influence of binding affinity change on disease-causing mutations in MP complexes, thereby aiding in the understanding of disease mechanisms and the identification of potential therapeutic targets.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11568875/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142643857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Briefings in bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1