首页 > 最新文献

Journal of Computational Biology最新文献

英文 中文
Numerical Analysis of Split-Step Backward Euler Method with Truncated Wiener Process for a Stochastic Susceptible-Infected-Susceptible Model. 随机易感性-感染易感性模型的截断Wiener过程的分段后退Euler方法的数值分析。
IF 1.7 4区 生物学 Q2 Mathematics Pub Date : 2023-10-01 Epub Date: 2023-10-09 DOI: 10.1089/cmb.2022.0462
Xiaochen Yang, Zhanwen Yang, Chiping Zhang

This article deals with the numerical positivity, boundedness, convergence, and dynamical behaviors for stochastic susceptible-infected-susceptible (SIS) model. To guarantee the biological significance of the split-step backward Euler method applied to the stochastic SIS model, the numerical positivity and boundedness are investigated by the truncated Wiener process. Motivated by the almost sure boundedness of exact and numerical solutions, the convergence is discussed by the fundamental convergence theorem with a local Lipschitz condition. Moreover, the numerical extinction and persistence are initially obtained by an exponential presentation of the stochastic stability function and strong law of the large number for martingales, which reproduces the existing theoretical results. Finally, numerical examples are given to validate our numerical results for the stochastic SIS model.

本文研究随机易感感染易感(SIS)模型的数值正性、有界性、收敛性和动力学行为。为了保证分段后向欧拉方法应用于随机SIS模型的生物学意义,利用截断Wiener过程研究了数值的正性和有界性。受精确解和数值解的几乎肯定有界性的启发,利用具有局部Lipschitz条件的基本收敛定理讨论了收敛性。此外,数值消光和持久性最初是通过随机稳定函数的指数表示和鞅的强大数定律得到的,这再现了现有的理论结果。最后,通过算例验证了随机SIS模型的数值结果。
{"title":"Numerical Analysis of Split-Step Backward Euler Method with Truncated Wiener Process for a Stochastic Susceptible-Infected-Susceptible Model.","authors":"Xiaochen Yang, Zhanwen Yang, Chiping Zhang","doi":"10.1089/cmb.2022.0462","DOIUrl":"10.1089/cmb.2022.0462","url":null,"abstract":"<p><p>This article deals with the numerical positivity, boundedness, convergence, and dynamical behaviors for stochastic susceptible-infected-susceptible (SIS) model. To guarantee the biological significance of the split-step backward Euler method applied to the stochastic SIS model, the numerical positivity and boundedness are investigated by the truncated Wiener process. Motivated by the almost sure boundedness of exact and numerical solutions, the convergence is discussed by the fundamental convergence theorem with a local Lipschitz condition. Moreover, the numerical extinction and persistence are initially obtained by an exponential presentation of the stochastic stability function and strong law of the large number for martingales, which reproduces the existing theoretical results. Finally, numerical examples are given to validate our numerical results for the stochastic SIS model.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41182721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating Low-Order and High-Order Correlation Information for Identifying Phage Virion Proteins. 整合低阶和高阶相关信息用于鉴定噬菌体病毒蛋白。
IF 1.7 4区 生物学 Q2 Mathematics Pub Date : 2023-10-01 Epub Date: 2023-09-20 DOI: 10.1089/cmb.2022.0237
Hongliang Zou, Wanting Yu

Phage virion proteins (PVPs) play an important role in the host cell. Fast and accurate identification of PVPs is beneficial for the discovery and development of related drugs. Although wet experimental approaches are the first choice to identify PVPs, they are costly and time-consuming. Thus, researchers have turned their attention to computational models, which can speed up related studies. Therefore, we proposed a novel machine-learning model to identify PVPs in the current study. First, 50 different types of physicochemical properties were used to denote protein sequences. Next, two different approaches, including Pearson's correlation coefficient (PCC) and maximal information coefficient (MIC), were employed to extract discriminative information. Further, to capture the high-order correlation information, we used PCC and MIC once again. After that, we adopted the least absolute shrinkage and selection operator algorithm to select the optimal feature subset. Finally, these chosen features were fed into a support vector machine to discriminate PVPs from phage non-virion proteins. We performed experiments on two different datasets to validate the effectiveness of our proposed method. Experimental results showed a significant improvement in performance compared with state-of-the-art approaches. It indicates that the proposed computational model may become a powerful predictor in identifying PVPs.

噬菌体病毒粒子蛋白(PVPs)在宿主细胞中起着重要作用。PVP的快速准确鉴定有利于相关药物的发现和开发。尽管湿法实验方法是鉴定PVP的首选方法,但它们成本高昂且耗时。因此,研究人员将注意力转向了计算模型,这可以加快相关研究的速度。因此,我们在当前的研究中提出了一种新的机器学习模型来识别PVP。首先,使用50种不同类型的物理化学性质来表示蛋白质序列。其次,采用皮尔逊相关系数(PCC)和最大信息系数(MIC)两种不同的方法提取判别信息。此外,为了捕获高阶相关信息,我们再次使用PCC和MIC。之后,我们采用最小绝对收缩和选择算子算法来选择最优特征子集。最后,将这些选择的特征输入到支持载体机器中,以区分PVP和噬菌体非病毒粒子蛋白。我们在两个不同的数据集上进行了实验,以验证我们提出的方法的有效性。实验结果表明,与最先进的方法相比,性能显著提高。这表明所提出的计算模型可能成为识别PVP的强大预测因子。
{"title":"Integrating Low-Order and High-Order Correlation Information for Identifying Phage Virion Proteins.","authors":"Hongliang Zou,&nbsp;Wanting Yu","doi":"10.1089/cmb.2022.0237","DOIUrl":"10.1089/cmb.2022.0237","url":null,"abstract":"<p><p>Phage virion proteins (PVPs) play an important role in the host cell. Fast and accurate identification of PVPs is beneficial for the discovery and development of related drugs. Although wet experimental approaches are the first choice to identify PVPs, they are costly and time-consuming. Thus, researchers have turned their attention to computational models, which can speed up related studies. Therefore, we proposed a novel machine-learning model to identify PVPs in the current study. First, 50 different types of physicochemical properties were used to denote protein sequences. Next, two different approaches, including Pearson's correlation coefficient (PCC) and maximal information coefficient (MIC), were employed to extract discriminative information. Further, to capture the high-order correlation information, we used PCC and MIC once again. After that, we adopted the least absolute shrinkage and selection operator algorithm to select the optimal feature subset. Finally, these chosen features were fed into a support vector machine to discriminate PVPs from phage non-virion proteins. We performed experiments on two different datasets to validate the effectiveness of our proposed method. Experimental results showed a significant improvement in performance compared with state-of-the-art approaches. It indicates that the proposed computational model may become a powerful predictor in identifying PVPs.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41127589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Weighted Selection Probability to Prioritize Susceptible Rare Variants in Multi-Phenotype Association Studies with Application to a Soybean Genetic Data Set. 在多表型关联研究中优先考虑易感稀有变异的加权选择概率及其在大豆遗传数据集中的应用。
IF 1.7 4区 生物学 Q2 Mathematics Pub Date : 2023-10-01 DOI: 10.1089/cmb.2022.0487
Xianglong Liang, Hokeun Sun

Rare variant association studies with multiple traits or diseases have drawn a lot of attention since association signals of rare variants can be boosted if more than one phenotype outcome is associated with the same rare variants. Most of the existing statistical methods to identify rare variants associated with multiple phenotypes are based on a group test, where a pre-specified genetic region is tested one at a time. However, these methods are not designed to locate susceptible rare variants within the genetic region. In this article, we propose new statistical methods to prioritize rare variants within a genetic region when a group test for the genetic region identifies a statistical association with multiple phenotypes. It computes the weighted selection probability (WSP) of individual rare variants and ranks them from largest to smallest according to their WSP. In simulation studies, we demonstrated that the proposed method outperforms other statistical methods in terms of true positive selection, when multiple phenotypes are correlated with each other. We also applied it to our soybean single nucleotide polymorphism (SNP) data with 13 highly correlated amino acids, where we identified some potentially susceptible rare variants in chromosome 19.

具有多种性状或疾病的罕见变异关联研究引起了人们的广泛关注,因为如果同一罕见变异与多个表型结果相关,则罕见变异的关联信号可以增强。大多数现有的识别与多种表型相关的罕见变异的统计方法都是基于群体测试,即一次测试一个预先指定的遗传区域。然而,这些方法并不是为了在遗传区域内定位易感的罕见变异。在这篇文章中,我们提出了新的统计方法,当遗传区域的群体测试确定与多种表型的统计关联时,优先考虑遗传区域内的罕见变异。它计算单个稀有变体的加权选择概率(WSP),并根据其WSP从大到小进行排序。在模拟研究中,我们证明了当多种表型相互关联时,所提出的方法在真阳性选择方面优于其他统计方法。我们还将其应用于具有13个高度相关氨基酸的大豆单核苷酸多态性(SNP)数据,在19号染色体上发现了一些潜在的易感罕见变异。
{"title":"Weighted Selection Probability to Prioritize Susceptible Rare Variants in Multi-Phenotype Association Studies with Application to a Soybean Genetic Data Set.","authors":"Xianglong Liang, Hokeun Sun","doi":"10.1089/cmb.2022.0487","DOIUrl":"10.1089/cmb.2022.0487","url":null,"abstract":"<p><p>Rare variant association studies with multiple traits or diseases have drawn a lot of attention since association signals of rare variants can be boosted if more than one phenotype outcome is associated with the same rare variants. Most of the existing statistical methods to identify rare variants associated with multiple phenotypes are based on a group test, where a pre-specified genetic region is tested one at a time. However, these methods are not designed to locate susceptible rare variants within the genetic region. In this article, we propose new statistical methods to prioritize rare variants within a genetic region when a group test for the genetic region identifies a statistical association with multiple phenotypes. It computes the weighted selection probability (WSP) of individual rare variants and ranks them from largest to smallest according to their WSP. In simulation studies, we demonstrated that the proposed method outperforms other statistical methods in terms of true positive selection, when multiple phenotypes are correlated with each other. We also applied it to our soybean single nucleotide polymorphism (SNP) data with 13 highly correlated amino acids, where we identified some potentially susceptible rare variants in chromosome 19.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49690786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying Subpopulations of Cells in Single-Cell Transcriptomic Data: A Bayesian Mixture Modeling Approach to Zero Inflation of Counts. 识别单细胞转录组数据中的细胞亚群:计数零膨胀的贝叶斯混合建模方法。
IF 1.7 4区 生物学 Q2 Mathematics Pub Date : 2023-10-01 DOI: 10.1089/cmb.2022.0273
Tom Wilson, Duong H T Vo, Thomas Thorne

In the study of single-cell RNA-seq (scRNA-Seq) data, a key component of the analysis is to identify subpopulations of cells in the data. A variety of approaches to this have been considered, and although many machine learning-based methods have been developed, these rarely give an estimate of uncertainty in the cluster assignment. To allow for this, probabilistic models have been developed, but scRNA-Seq data exhibit a phenomenon known as dropout, whereby a large proportion of the observed read counts are zero. This poses challenges in developing probabilistic models that appropriately model the data. We develop a novel Dirichlet process mixture model that employs both a mixture at the cell level to model multiple populations of cells and a zero-inflated negative binomial mixture of counts at the transcript level. By taking a Bayesian approach, we are able to model the expression of genes within clusters, and to quantify uncertainty in cluster assignments. It is shown that this approach outperforms previous approaches that applied multinomial distributions to model scRNA-Seq counts and negative binomial models that do not take into account zero inflation. Applied to a publicly available data set of scRNA-Seq counts of multiple cell types from the mouse cortex and hippocampus, we demonstrate how our approach can be used to distinguish subpopulations of cells as clusters in the data, and to identify gene sets that are indicative of membership of a subpopulation.

在单细胞RNA-seq(scRNA-seq)数据的研究中,分析的一个关键组成部分是识别数据中的细胞亚群。已经考虑了多种方法,尽管已经开发了许多基于机器学习的方法,但这些方法很少对聚类分配的不确定性进行估计。为了实现这一点,已经开发了概率模型,但scRNA-Seq数据表现出一种称为脱落的现象,即观察到的读取计数中有很大一部分为零。这对开发对数据进行适当建模的概率模型提出了挑战。我们开发了一种新的狄利克雷过程混合物模型,该模型既采用细胞水平的混合物来模拟多个细胞群体,也采用转录物水平的计数的零膨胀负二项式混合物。通过采用贝叶斯方法,我们能够对聚类中基因的表达进行建模,并量化聚类分配中的不确定性。结果表明,该方法优于以前的方法,以前的方法将多项式分布应用于scRNA-Seq计数建模,而负二项式模型不考虑零通货膨胀。应用于来自小鼠皮层和海马体的多种细胞类型的scRNA-Seq计数的公开数据集,我们展示了我们的方法如何用于将细胞亚群区分为数据中的簇,并识别指示亚群成员身份的基因集。
{"title":"Identifying Subpopulations of Cells in Single-Cell Transcriptomic Data: A Bayesian Mixture Modeling Approach to Zero Inflation of Counts.","authors":"Tom Wilson, Duong H T Vo, Thomas Thorne","doi":"10.1089/cmb.2022.0273","DOIUrl":"10.1089/cmb.2022.0273","url":null,"abstract":"<p><p>In the study of single-cell RNA-seq (scRNA-Seq) data, a key component of the analysis is to identify subpopulations of cells in the data. A variety of approaches to this have been considered, and although many machine learning-based methods have been developed, these rarely give an estimate of uncertainty in the cluster assignment. To allow for this, probabilistic models have been developed, but scRNA-Seq data exhibit a phenomenon known as dropout, whereby a large proportion of the observed read counts are zero. This poses challenges in developing probabilistic models that appropriately model the data. We develop a novel Dirichlet process mixture model that employs both a mixture at the cell level to model multiple populations of cells and a zero-inflated negative binomial mixture of counts at the transcript level. By taking a Bayesian approach, we are able to model the expression of genes within clusters, and to quantify uncertainty in cluster assignments. It is shown that this approach outperforms previous approaches that applied multinomial distributions to model scRNA-Seq counts and negative binomial models that do not take into account zero inflation. Applied to a publicly available data set of scRNA-Seq counts of multiple cell types from the mouse cortex and hippocampus, we demonstrate how our approach can be used to distinguish subpopulations of cells as clusters in the data, and to identify gene sets that are indicative of membership of a subpopulation.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49690785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Constructing Double Cyclic Codes over F2+uF2 for DNA Codes. 在F2+uF2上构造DNA编码的双循环码。
IF 1.7 4区 生物学 Q2 Mathematics Pub Date : 2023-10-01 DOI: 10.1089/cmb.2022.0151
Arunothai Kanlaya, Chakkrid Klin-Eam

In this article, we investigate the algebraic structure of double cyclic codes of length (α,β) over F2+uF2 with u2=0 and construct DNA codes from these codes. The theory of constructing double cyclic codes suitable for DNA codes is studied. We provide the necessary and sufficient conditions for the double cyclic codes to be reversible and reversible-complement codes. As an illustration, we present some of the DNA codes generated from our results.

本文研究了u2=0的F2+uF2上长度为(α,β)的双循环码的代数结构,并用这些双循环码构造了DNA编码。研究了构建适用于DNA编码的双循环编码的理论。我们给出了双循环码是可逆和可逆补码的充要条件。作为一个例子,我们展示了从我们的结果中产生的一些DNA代码。
{"title":"<ArticleTitle xmlns:ns0=\"http://www.w3.org/1998/Math/MathML\">Constructing Double Cyclic Codes over <ns0:math><ns0:msub><ns0:mrow><ns0:mstyle><ns0:mi>F</ns0:mi></ns0:mstyle></ns0:mrow><ns0:mrow><ns0:mn>2</ns0:mn></ns0:mrow></ns0:msub><ns0:mo>+</ns0:mo><ns0:mi>u</ns0:mi><ns0:msub><ns0:mrow><ns0:mstyle><ns0:mi>F</ns0:mi></ns0:mstyle></ns0:mrow><ns0:mrow><ns0:mn>2</ns0:mn></ns0:mrow></ns0:msub></ns0:math> for DNA Codes.","authors":"Arunothai Kanlaya, Chakkrid Klin-Eam","doi":"10.1089/cmb.2022.0151","DOIUrl":"10.1089/cmb.2022.0151","url":null,"abstract":"<p><p>In this article, we investigate the algebraic structure of double cyclic codes of length <math><mrow><mo>(</mo><mrow><mi>α</mi><mo>,</mo><mi>β</mi></mrow><mo>)</mo></mrow></math> over <math><msub><mrow><mstyle><mi>F</mi></mstyle></mrow><mrow><mn>2</mn></mrow></msub><mo>+</mo><mi>u</mi><msub><mrow><mstyle><mi>F</mi></mstyle></mrow><mrow><mn>2</mn></mrow></msub></math> with <math><msup><mrow><mi>u</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>=</mo><mn>0</mn></math> and construct DNA codes from these codes. The theory of constructing double cyclic codes suitable for DNA codes is studied. We provide the necessary and sufficient conditions for the double cyclic codes to be reversible and reversible-complement codes. As an illustration, we present some of the DNA codes generated from our results.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49690784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reconstruction of Viral Variants via Monte Carlo Clustering. 通过蒙特卡罗聚类重建病毒变体。
IF 1.4 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-09-01 Epub Date: 2023-09-11 DOI: 10.1089/cmb.2023.0154
Akshay Juyal, Roya Hosseini, Daniel Novikov, Mark Grinshpon, Alex Zelikovsky

Identifying viral variants through clustering is essential for understanding the composition and structure of viral populations within and between hosts, which play a crucial role in disease progression and epidemic spread. This article proposes and validates novel Monte Carlo (MC) methods for clustering aligned viral sequences by minimizing either entropy or Hamming distance from consensuses. We validate these methods on four benchmarks: two SARS-CoV-2 interhost data sets and two HIV intrahost data sets. A parallelized version of our tool is scalable to very large data sets. We show that both entropy and Hamming distance-based MC clusterings discern the meaningful information from sequencing data. The proposed clustering methods consistently converge to similar clusterings across different runs. Finally, we show that MC clustering improves reconstruction of intrahost viral population from sequencing data.

通过聚类识别病毒变体对于了解宿主内部和宿主之间病毒种群的组成和结构至关重要,宿主在疾病进展和流行病传播中发挥着至关重要的作用。本文提出并验证了通过最小化熵或与共识的汉明距离来聚类比对病毒序列的新蒙特卡罗(MC)方法。我们在四个基准上验证了这些方法:两个严重急性呼吸系统综合征冠状病毒2型宿主间数据集和两个艾滋病毒宿主内数据集。我们的工具的并行版本可以扩展到非常大的数据集。我们表明,基于熵和汉明距离的MC聚类都能从测序数据中识别出有意义的信息。所提出的聚类方法在不同的运行中一致地收敛到相似的聚类。最后,我们证明了MC聚类改进了从测序数据重建宿主内病毒群。
{"title":"Reconstruction of Viral Variants via Monte Carlo Clustering.","authors":"Akshay Juyal, Roya Hosseini, Daniel Novikov, Mark Grinshpon, Alex Zelikovsky","doi":"10.1089/cmb.2023.0154","DOIUrl":"10.1089/cmb.2023.0154","url":null,"abstract":"<p><p>Identifying viral variants through clustering is essential for understanding the composition and structure of viral populations within and between hosts, which play a crucial role in disease progression and epidemic spread. This article proposes and validates novel Monte Carlo (MC) methods for clustering aligned viral sequences by minimizing either entropy or Hamming distance from consensuses. We validate these methods on four benchmarks: two SARS-CoV-2 interhost data sets and two HIV intrahost data sets. A parallelized version of our tool is scalable to very large data sets. We show that both entropy and Hamming distance-based MC clusterings discern the meaningful information from sequencing data. The proposed clustering methods consistently converge to similar clusterings across different runs. Finally, we show that MC clustering improves reconstruction of intrahost viral population from sequencing data.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10518690/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10202955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DAHNGC: A Graph Convolution Model for Drug-Disease Association Prediction by Using Heterogeneous Network. DAHNGC:一个利用异构网络进行药物-疾病关联预测的图卷积模型。
IF 1.7 4区 生物学 Q2 Mathematics Pub Date : 2023-09-01 Epub Date: 2023-09-13 DOI: 10.1089/cmb.2023.0135
Jiancheng Zhong, Pan Cui, Yihong Zhu, Qiu Xiao, Zuohang Qu

In the field of drug development and repositioning, the prediction of drug-disease associations is a critical task. A recently proposed method for predicting drug-disease associations based on graph convolution relies heavily on the features of adjacent nodes within the homogeneous network for characterizing information. However, this method lacks node attribute information from heterogeneous networks, which could hardly provide valuable insights for predicting drug-disease associations. In this study, a novel drug-disease association prediction model called DAHNGC is proposed, which is based on a graph convolutional neural network. This model includes two feature extraction methods that are specifically designed to extract the attribute characteristics of drugs and diseases from both homogeneous and heterogeneous networks. First, the DropEdge technique is added to the graph convolutional neural network to alleviate the oversmoothing problem and obtain the characteristics of the same nodes of drugs or diseases in the homogeneous network. Then, an automatic feature extraction method in the heterogeneous network is designed to obtain the features of drugs or diseases at different nodes. Finally, the obtained features are put into the fully connected network for nonlinear transformation, and the potential drug-disease pairs are obtained by bilinear decoding. Experimental results demonstrate that the DAHNGC model exhibits good predictive performance for drug-disease associations.

在药物开发和重新定位领域,预测药物与疾病的相关性是一项关键任务。最近提出的一种基于图卷积的药物-疾病关联预测方法在很大程度上依赖于同构网络内相邻节点的特征来表征信息。然而,该方法缺乏来自异构网络的节点属性信息,难以为预测药物-疾病关联提供有价值的见解。在本研究中,提出了一种新的基于图卷积神经网络的药物-疾病关联预测模型DAHNGC。该模型包括两种特征提取方法,专门用于从同质和异质网络中提取药物和疾病的属性特征。首先,将DropEdge技术添加到图卷积神经网络中,以缓解过度光滑的问题,并在同构网络中获得药物或疾病的相同节点的特征。然后,设计了一种异构网络中的自动特征提取方法,以获取不同节点的药物或疾病的特征。最后,将获得的特征放入全连通网络中进行非线性变换,并通过双线性解码获得潜在的药物-疾病对。实验结果表明,DAHNGC模型对药物-疾病相关性具有良好的预测性能。
{"title":"DAHNGC: A Graph Convolution Model for Drug-Disease Association Prediction by Using Heterogeneous Network.","authors":"Jiancheng Zhong,&nbsp;Pan Cui,&nbsp;Yihong Zhu,&nbsp;Qiu Xiao,&nbsp;Zuohang Qu","doi":"10.1089/cmb.2023.0135","DOIUrl":"10.1089/cmb.2023.0135","url":null,"abstract":"<p><p>In the field of drug development and repositioning, the prediction of drug-disease associations is a critical task. A recently proposed method for predicting drug-disease associations based on graph convolution relies heavily on the features of adjacent nodes within the homogeneous network for characterizing information. However, this method lacks node attribute information from heterogeneous networks, which could hardly provide valuable insights for predicting drug-disease associations. In this study, a novel drug-disease association prediction model called DAHNGC is proposed, which is based on a graph convolutional neural network. This model includes two feature extraction methods that are specifically designed to extract the attribute characteristics of drugs and diseases from both homogeneous and heterogeneous networks. First, the DropEdge technique is added to the graph convolutional neural network to alleviate the oversmoothing problem and obtain the characteristics of the same nodes of drugs or diseases in the homogeneous network. Then, an automatic feature extraction method in the heterogeneous network is designed to obtain the features of drugs or diseases at different nodes. Finally, the obtained features are put into the fully connected network for nonlinear transformation, and the potential drug-disease pairs are obtained by bilinear decoding. Experimental results demonstrate that the DAHNGC model exhibits good predictive performance for drug-disease associations.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10590769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Integration Framework of Secure Multiparty Computation and Deep Neural Network for Improving Drug-Drug Interaction Predictions. 用于改进药物相互作用预测的安全多方计算和深度神经网络的集成框架。
IF 1.7 4区 生物学 Q2 Mathematics Pub Date : 2023-09-01 Epub Date: 2023-09-14 DOI: 10.1089/cmb.2023.0076
Liang Pan, Xia Xiao, Shengyun Liu, Shaoliang Peng

Drug-drug interaction (DDI) is a key concern in drug development and pharmacovigilance. It is important to improve DDI predictions by integrating multisource data from various pharmaceutical companies. Unfortunately, the data privacy and financial interest issues seriously influence the interinstitutional collaborations for DDI predictions. We propose multiparty computation DDI (MPCDDI), a secure MPC-based deep learning framework for DDI predictions. MPCDDI leverages the secret sharing technologies to incorporate the drug-related feature data from multiple institutions and develops a deep learning model for DDI predictions. In MPCDDI, all data transmission and deep learning operations are integrated into secure MPC frameworks to enable high-quality collaboration among pharmaceutical institutions without divulging private drug-related information. The results suggest that MPCDDI is superior to other eight baselines and achieves the similar performance to that of the corresponding plaintext collaborations. More interestingly, MPCDDI significantly outperforms methods that use private data from the single institution. In summary, MPCDDI is an effective framework for promoting collaborative and privacy-preserving drug discovery.

药物-药物相互作用(DDI)是药物开发和药物警戒中的一个关键问题。通过整合来自不同制药公司的多源数据来改进DDI预测是很重要的。不幸的是,数据隐私和财务利益问题严重影响了DDI预测的机构间合作。我们提出了多方计算DDI(MPCDI),这是一种用于DDI预测的基于MPC的安全深度学习框架。MPCDI利用秘密共享技术整合了来自多个机构的毒品相关特征数据,并开发了用于DDI预测的深度学习模型。在MPCDI中,所有数据传输和深度学习操作都集成到安全的MPC框架中,以实现制药机构之间的高质量合作,而不会泄露私人毒品相关信息。结果表明,MPCDI优于其他八种基线,并实现了与相应明文协作类似的性能。更有趣的是,MPCDI显著优于使用单一机构私人数据的方法。总之,MPCDI是促进合作和保护隐私的药物发现的有效框架。
{"title":"An Integration Framework of Secure Multiparty Computation and Deep Neural Network for Improving Drug-Drug Interaction Predictions.","authors":"Liang Pan,&nbsp;Xia Xiao,&nbsp;Shengyun Liu,&nbsp;Shaoliang Peng","doi":"10.1089/cmb.2023.0076","DOIUrl":"10.1089/cmb.2023.0076","url":null,"abstract":"<p><p>Drug-drug interaction (DDI) is a key concern in drug development and pharmacovigilance. It is important to improve DDI predictions by integrating multisource data from various pharmaceutical companies. Unfortunately, the data privacy and financial interest issues seriously influence the interinstitutional collaborations for DDI predictions. We propose multiparty computation DDI (MPCDDI), a secure MPC-based deep learning framework for DDI predictions. MPCDDI leverages the secret sharing technologies to incorporate the drug-related feature data from multiple institutions and develops a deep learning model for DDI predictions. In MPCDDI, all data transmission and deep learning operations are integrated into secure MPC frameworks to enable high-quality collaboration among pharmaceutical institutions without divulging private drug-related information. The results suggest that MPCDDI is superior to other eight baselines and achieves the similar performance to that of the corresponding plaintext collaborations. More interestingly, MPCDDI significantly outperforms methods that use private data from the single institution. In summary, MPCDDI is an effective framework for promoting collaborative and privacy-preserving drug discovery.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10245038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HF-DDI: Predicting Drug-Drug Interaction Events Based on Multimodal Hybrid Fusion. HF-DDI:基于多模式混合融合预测药物相互作用事件。
IF 1.7 4区 生物学 Q2 Mathematics Pub Date : 2023-09-01 Epub Date: 2023-08-18 DOI: 10.1089/cmb.2023.0068
An Huang, Xiaolan Xie, Xiaojun Yao, Huanxiang Liu, Xiaoqi Wang, Shaoliang Peng

Drug-drug interactions (DDIs) can have a significant impact on patient safety and health. Predicting potential DDIs before administering drugs to patients is a critical step in drug development and can help prevent adverse drug events. In this study, we propose a novel method called HF-DDI for predicting DDI events based on various drug features, including molecular structure, target, and enzyme information. Specifically, we design our model with both early fusion and late fusion strategies and utilize a score calculation module to predict the likelihood of interactions between drugs. Our model was trained and tested on a large data set of known DDIs, achieving an overall accuracy of 0.948. The results suggest that incorporating multiple drug features can improve the accuracy of DDI event prediction and may be useful for improving drug safety and patient outcomes.

药物-药物相互作用(DDI)会对患者的安全和健康产生重大影响。在给患者用药前预测潜在的DDI是药物开发的关键一步,有助于预防药物不良事件。在这项研究中,我们提出了一种新的方法,称为HF-DDI,用于根据各种药物特征预测DDI事件,包括分子结构、靶标和酶信息。具体而言,我们设计了具有早期融合和晚期融合策略的模型,并利用分数计算模块来预测药物之间相互作用的可能性。我们的模型在已知DDI的大数据集上进行了训练和测试,总体精度达到0.948。结果表明,结合多种药物特征可以提高DDI事件预测的准确性,并可能有助于提高药物安全性和患者预后。
{"title":"HF-DDI: Predicting Drug-Drug Interaction Events Based on Multimodal Hybrid Fusion.","authors":"An Huang,&nbsp;Xiaolan Xie,&nbsp;Xiaojun Yao,&nbsp;Huanxiang Liu,&nbsp;Xiaoqi Wang,&nbsp;Shaoliang Peng","doi":"10.1089/cmb.2023.0068","DOIUrl":"10.1089/cmb.2023.0068","url":null,"abstract":"<p><p>Drug-drug interactions (DDIs) can have a significant impact on patient safety and health. Predicting potential DDIs before administering drugs to patients is a critical step in drug development and can help prevent adverse drug events. In this study, we propose a novel method called HF-DDI for predicting DDI events based on various drug features, including molecular structure, target, and enzyme information. Specifically, we design our model with both early fusion and late fusion strategies and utilize a score calculation module to predict the likelihood of interactions between drugs. Our model was trained and tested on a large data set of known DDIs, achieving an overall accuracy of 0.948. The results suggest that incorporating multiple drug features can improve the accuracy of DDI event prediction and may be useful for improving drug safety and patient outcomes.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10024623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Protein Complex Identification Based on Heterogeneous Protein Information Network. 基于异构蛋白质信息网络的蛋白质复合体识别。
IF 1.7 4区 生物学 Q2 Mathematics Pub Date : 2023-09-01 Epub Date: 2023-09-06 DOI: 10.1089/cmb.2023.0081
Peixuan Zhou, Yijia Zhang, Zeqian Li, Kuo Pang, Di Zhao

Protein complexes are the foundation of all cellular activities, and accurately identifying them is crucial for studying cellular systems. The efficient discovery of protein complexes is a focus of research in the field of bioinformatics. Most existing methods for protein complex identification are based on the structure of the protein-protein interaction (PPI) network, whereas some methods attempt to integrate biological information to enhance the features of the protein network for complex identification. Existing protein complex identification methods are unable to fully integrate network topology information and biological attribute information. Most of these methods are based on homogeneous networks and cannot distinguish the importance of different attributes and protein nodes. To address these issues, a GO attribute Heterogeneous Attention network Embedding (GHAE) method based on heterogeneous protein information networks is proposed. First, GHAE incorporates Gene Ontology (GO) information into the PPI network, constructing a heterogeneous protein information network. Then, GHAE uses a dual attention mechanism and heterogeneous graph convolutional representation learning method to learn protein features and to identify protein complexes. The experimental results show that building heterogeneous protein information networks can fully integrate valuable biological information. The heterogeneous graph embedding learning method can simultaneously mine the features of protein and GO attributes, thereby improving the performance of protein complex identification.

蛋白质复合物是所有细胞活动的基础,准确识别它们对于研究细胞系统至关重要。蛋白质复合物的有效发现是生物信息学领域的研究热点。大多数现有的蛋白质复合物鉴定方法都是基于蛋白质-蛋白质相互作用(PPI)网络的结构,而一些方法试图整合生物信息以增强蛋白质网络的特征,用于复合物鉴定。现有的蛋白质复合体识别方法无法完全融合网络拓扑信息和生物属性信息。这些方法大多基于同质网络,无法区分不同属性和蛋白质节点的重要性。针对这些问题,提出了一种基于异构蛋白质信息网络的GO属性异构注意力网络嵌入方法。首先,GHAE将基因本体论(GO)信息整合到PPI网络中,构建异构蛋白质信息网络。然后,GHAE使用双重注意力机制和异构图卷积表示学习方法来学习蛋白质特征和识别蛋白质复合物。实验结果表明,构建异构蛋白质信息网络可以充分整合有价值的生物信息。异构图嵌入学习方法可以同时挖掘蛋白质和GO属性的特征,从而提高蛋白质复合物识别的性能。
{"title":"Protein Complex Identification Based on Heterogeneous Protein Information Network.","authors":"Peixuan Zhou,&nbsp;Yijia Zhang,&nbsp;Zeqian Li,&nbsp;Kuo Pang,&nbsp;Di Zhao","doi":"10.1089/cmb.2023.0081","DOIUrl":"10.1089/cmb.2023.0081","url":null,"abstract":"<p><p>Protein complexes are the foundation of all cellular activities, and accurately identifying them is crucial for studying cellular systems. The efficient discovery of protein complexes is a focus of research in the field of bioinformatics. Most existing methods for protein complex identification are based on the structure of the protein-protein interaction (PPI) network, whereas some methods attempt to integrate biological information to enhance the features of the protein network for complex identification. Existing protein complex identification methods are unable to fully integrate network topology information and biological attribute information. Most of these methods are based on homogeneous networks and cannot distinguish the importance of different attributes and protein nodes. To address these issues, a GO attribute Heterogeneous Attention network Embedding (GHAE) method based on heterogeneous protein information networks is proposed. First, GHAE incorporates Gene Ontology (GO) information into the PPI network, constructing a heterogeneous protein information network. Then, GHAE uses a dual attention mechanism and heterogeneous graph convolutional representation learning method to learn protein features and to identify protein complexes. The experimental results show that building heterogeneous protein information networks can fully integrate valuable biological information. The heterogeneous graph embedding learning method can simultaneously mine the features of protein and GO attributes, thereby improving the performance of protein complex identification.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10160703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Computational Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1