Pub Date : 2023-10-01Epub Date: 2023-10-09DOI: 10.1089/cmb.2022.0501
Masaru Nakajima, Andrew D Smith
RNA secondary structures are essential abstractions for understanding spacial folding behaviors of those macromolecules. Many secondary structure algorithms involve a common dynamic programming setup to exploit the property that secondary structures can be decomposed into substructures. Dirks et al. noted that this setup cannot directly address an issue of distinguishability among secondary structures, which arises for classes of sequences that admit nontrivial symmetry. Circular sequences are among these. We examine the problem of counting distinguishable secondary structures. Drawing from elementary results in group theory, we identify useful subsets of secondary structures. We then extend an algorithm due to Hofacker et al. for computing the sizes of these subsets. This yields a cubic-time algorithm to count distinguishable structures compatible with a given circular sequence. Furthermore, this general approach may be used to solve similar problems for which the RNA structures of interest involve symmetries.
{"title":"Counting Distinguishable RNA Secondary Structures.","authors":"Masaru Nakajima, Andrew D Smith","doi":"10.1089/cmb.2022.0501","DOIUrl":"10.1089/cmb.2022.0501","url":null,"abstract":"<p><p>RNA secondary structures are essential abstractions for understanding spacial folding behaviors of those macromolecules. Many secondary structure algorithms involve a common dynamic programming setup to exploit the property that secondary structures can be decomposed into substructures. Dirks et al. noted that this setup cannot directly address an issue of distinguishability among secondary structures, which arises for classes of sequences that admit nontrivial symmetry. Circular sequences are among these. We examine the problem of counting distinguishable secondary structures. Drawing from elementary results in group theory, we identify useful subsets of secondary structures. We then extend an algorithm due to Hofacker et al. for computing the sizes of these subsets. This yields a cubic-time algorithm to count distinguishable structures compatible with a given circular sequence. Furthermore, this general approach may be used to solve similar problems for which the RNA structures of interest involve symmetries.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41182720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-01Epub Date: 2023-10-09DOI: 10.1089/cmb.2022.0462
Xiaochen Yang, Zhanwen Yang, Chiping Zhang
This article deals with the numerical positivity, boundedness, convergence, and dynamical behaviors for stochastic susceptible-infected-susceptible (SIS) model. To guarantee the biological significance of the split-step backward Euler method applied to the stochastic SIS model, the numerical positivity and boundedness are investigated by the truncated Wiener process. Motivated by the almost sure boundedness of exact and numerical solutions, the convergence is discussed by the fundamental convergence theorem with a local Lipschitz condition. Moreover, the numerical extinction and persistence are initially obtained by an exponential presentation of the stochastic stability function and strong law of the large number for martingales, which reproduces the existing theoretical results. Finally, numerical examples are given to validate our numerical results for the stochastic SIS model.
{"title":"Numerical Analysis of Split-Step Backward Euler Method with Truncated Wiener Process for a Stochastic Susceptible-Infected-Susceptible Model.","authors":"Xiaochen Yang, Zhanwen Yang, Chiping Zhang","doi":"10.1089/cmb.2022.0462","DOIUrl":"10.1089/cmb.2022.0462","url":null,"abstract":"<p><p>This article deals with the numerical positivity, boundedness, convergence, and dynamical behaviors for stochastic susceptible-infected-susceptible (SIS) model. To guarantee the biological significance of the split-step backward Euler method applied to the stochastic SIS model, the numerical positivity and boundedness are investigated by the truncated Wiener process. Motivated by the almost sure boundedness of exact and numerical solutions, the convergence is discussed by the fundamental convergence theorem with a local Lipschitz condition. Moreover, the numerical extinction and persistence are initially obtained by an exponential presentation of the stochastic stability function and strong law of the large number for martingales, which reproduces the existing theoretical results. Finally, numerical examples are given to validate our numerical results for the stochastic SIS model.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41182721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-01Epub Date: 2023-09-20DOI: 10.1089/cmb.2022.0237
Hongliang Zou, Wanting Yu
Phage virion proteins (PVPs) play an important role in the host cell. Fast and accurate identification of PVPs is beneficial for the discovery and development of related drugs. Although wet experimental approaches are the first choice to identify PVPs, they are costly and time-consuming. Thus, researchers have turned their attention to computational models, which can speed up related studies. Therefore, we proposed a novel machine-learning model to identify PVPs in the current study. First, 50 different types of physicochemical properties were used to denote protein sequences. Next, two different approaches, including Pearson's correlation coefficient (PCC) and maximal information coefficient (MIC), were employed to extract discriminative information. Further, to capture the high-order correlation information, we used PCC and MIC once again. After that, we adopted the least absolute shrinkage and selection operator algorithm to select the optimal feature subset. Finally, these chosen features were fed into a support vector machine to discriminate PVPs from phage non-virion proteins. We performed experiments on two different datasets to validate the effectiveness of our proposed method. Experimental results showed a significant improvement in performance compared with state-of-the-art approaches. It indicates that the proposed computational model may become a powerful predictor in identifying PVPs.
{"title":"Integrating Low-Order and High-Order Correlation Information for Identifying Phage Virion Proteins.","authors":"Hongliang Zou, Wanting Yu","doi":"10.1089/cmb.2022.0237","DOIUrl":"10.1089/cmb.2022.0237","url":null,"abstract":"<p><p>Phage virion proteins (PVPs) play an important role in the host cell. Fast and accurate identification of PVPs is beneficial for the discovery and development of related drugs. Although wet experimental approaches are the first choice to identify PVPs, they are costly and time-consuming. Thus, researchers have turned their attention to computational models, which can speed up related studies. Therefore, we proposed a novel machine-learning model to identify PVPs in the current study. First, 50 different types of physicochemical properties were used to denote protein sequences. Next, two different approaches, including Pearson's correlation coefficient (PCC) and maximal information coefficient (MIC), were employed to extract discriminative information. Further, to capture the high-order correlation information, we used PCC and MIC once again. After that, we adopted the least absolute shrinkage and selection operator algorithm to select the optimal feature subset. Finally, these chosen features were fed into a support vector machine to discriminate PVPs from phage non-virion proteins. We performed experiments on two different datasets to validate the effectiveness of our proposed method. Experimental results showed a significant improvement in performance compared with state-of-the-art approaches. It indicates that the proposed computational model may become a powerful predictor in identifying PVPs.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41127589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rare variant association studies with multiple traits or diseases have drawn a lot of attention since association signals of rare variants can be boosted if more than one phenotype outcome is associated with the same rare variants. Most of the existing statistical methods to identify rare variants associated with multiple phenotypes are based on a group test, where a pre-specified genetic region is tested one at a time. However, these methods are not designed to locate susceptible rare variants within the genetic region. In this article, we propose new statistical methods to prioritize rare variants within a genetic region when a group test for the genetic region identifies a statistical association with multiple phenotypes. It computes the weighted selection probability (WSP) of individual rare variants and ranks them from largest to smallest according to their WSP. In simulation studies, we demonstrated that the proposed method outperforms other statistical methods in terms of true positive selection, when multiple phenotypes are correlated with each other. We also applied it to our soybean single nucleotide polymorphism (SNP) data with 13 highly correlated amino acids, where we identified some potentially susceptible rare variants in chromosome 19.
{"title":"Weighted Selection Probability to Prioritize Susceptible Rare Variants in Multi-Phenotype Association Studies with Application to a Soybean Genetic Data Set.","authors":"Xianglong Liang, Hokeun Sun","doi":"10.1089/cmb.2022.0487","DOIUrl":"10.1089/cmb.2022.0487","url":null,"abstract":"<p><p>Rare variant association studies with multiple traits or diseases have drawn a lot of attention since association signals of rare variants can be boosted if more than one phenotype outcome is associated with the same rare variants. Most of the existing statistical methods to identify rare variants associated with multiple phenotypes are based on a group test, where a pre-specified genetic region is tested one at a time. However, these methods are not designed to locate susceptible rare variants within the genetic region. In this article, we propose new statistical methods to prioritize rare variants within a genetic region when a group test for the genetic region identifies a statistical association with multiple phenotypes. It computes the weighted selection probability (WSP) of individual rare variants and ranks them from largest to smallest according to their WSP. In simulation studies, we demonstrated that the proposed method outperforms other statistical methods in terms of true positive selection, when multiple phenotypes are correlated with each other. We also applied it to our soybean single nucleotide polymorphism (SNP) data with 13 highly correlated amino acids, where we identified some potentially susceptible rare variants in chromosome 19.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49690786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the study of single-cell RNA-seq (scRNA-Seq) data, a key component of the analysis is to identify subpopulations of cells in the data. A variety of approaches to this have been considered, and although many machine learning-based methods have been developed, these rarely give an estimate of uncertainty in the cluster assignment. To allow for this, probabilistic models have been developed, but scRNA-Seq data exhibit a phenomenon known as dropout, whereby a large proportion of the observed read counts are zero. This poses challenges in developing probabilistic models that appropriately model the data. We develop a novel Dirichlet process mixture model that employs both a mixture at the cell level to model multiple populations of cells and a zero-inflated negative binomial mixture of counts at the transcript level. By taking a Bayesian approach, we are able to model the expression of genes within clusters, and to quantify uncertainty in cluster assignments. It is shown that this approach outperforms previous approaches that applied multinomial distributions to model scRNA-Seq counts and negative binomial models that do not take into account zero inflation. Applied to a publicly available data set of scRNA-Seq counts of multiple cell types from the mouse cortex and hippocampus, we demonstrate how our approach can be used to distinguish subpopulations of cells as clusters in the data, and to identify gene sets that are indicative of membership of a subpopulation.
{"title":"Identifying Subpopulations of Cells in Single-Cell Transcriptomic Data: A Bayesian Mixture Modeling Approach to Zero Inflation of Counts.","authors":"Tom Wilson, Duong H T Vo, Thomas Thorne","doi":"10.1089/cmb.2022.0273","DOIUrl":"10.1089/cmb.2022.0273","url":null,"abstract":"<p><p>In the study of single-cell RNA-seq (scRNA-Seq) data, a key component of the analysis is to identify subpopulations of cells in the data. A variety of approaches to this have been considered, and although many machine learning-based methods have been developed, these rarely give an estimate of uncertainty in the cluster assignment. To allow for this, probabilistic models have been developed, but scRNA-Seq data exhibit a phenomenon known as dropout, whereby a large proportion of the observed read counts are zero. This poses challenges in developing probabilistic models that appropriately model the data. We develop a novel Dirichlet process mixture model that employs both a mixture at the cell level to model multiple populations of cells and a zero-inflated negative binomial mixture of counts at the transcript level. By taking a Bayesian approach, we are able to model the expression of genes within clusters, and to quantify uncertainty in cluster assignments. It is shown that this approach outperforms previous approaches that applied multinomial distributions to model scRNA-Seq counts and negative binomial models that do not take into account zero inflation. Applied to a publicly available data set of scRNA-Seq counts of multiple cell types from the mouse cortex and hippocampus, we demonstrate how our approach can be used to distinguish subpopulations of cells as clusters in the data, and to identify gene sets that are indicative of membership of a subpopulation.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49690785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this article, we investigate the algebraic structure of double cyclic codes of length over with and construct DNA codes from these codes. The theory of constructing double cyclic codes suitable for DNA codes is studied. We provide the necessary and sufficient conditions for the double cyclic codes to be reversible and reversible-complement codes. As an illustration, we present some of the DNA codes generated from our results.
{"title":"<ArticleTitle xmlns:ns0=\"http://www.w3.org/1998/Math/MathML\">Constructing Double Cyclic Codes over <ns0:math><ns0:msub><ns0:mrow><ns0:mstyle><ns0:mi>F</ns0:mi></ns0:mstyle></ns0:mrow><ns0:mrow><ns0:mn>2</ns0:mn></ns0:mrow></ns0:msub><ns0:mo>+</ns0:mo><ns0:mi>u</ns0:mi><ns0:msub><ns0:mrow><ns0:mstyle><ns0:mi>F</ns0:mi></ns0:mstyle></ns0:mrow><ns0:mrow><ns0:mn>2</ns0:mn></ns0:mrow></ns0:msub></ns0:math> for DNA Codes.","authors":"Arunothai Kanlaya, Chakkrid Klin-Eam","doi":"10.1089/cmb.2022.0151","DOIUrl":"10.1089/cmb.2022.0151","url":null,"abstract":"<p><p>In this article, we investigate the algebraic structure of double cyclic codes of length <math><mrow><mo>(</mo><mrow><mi>α</mi><mo>,</mo><mi>β</mi></mrow><mo>)</mo></mrow></math> over <math><msub><mrow><mstyle><mi>F</mi></mstyle></mrow><mrow><mn>2</mn></mrow></msub><mo>+</mo><mi>u</mi><msub><mrow><mstyle><mi>F</mi></mstyle></mrow><mrow><mn>2</mn></mrow></msub></math> with <math><msup><mrow><mi>u</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>=</mo><mn>0</mn></math> and construct DNA codes from these codes. The theory of constructing double cyclic codes suitable for DNA codes is studied. We provide the necessary and sufficient conditions for the double cyclic codes to be reversible and reversible-complement codes. As an illustration, we present some of the DNA codes generated from our results.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49690784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01Epub Date: 2023-09-11DOI: 10.1089/cmb.2023.0154
Akshay Juyal, Roya Hosseini, Daniel Novikov, Mark Grinshpon, Alex Zelikovsky
Identifying viral variants through clustering is essential for understanding the composition and structure of viral populations within and between hosts, which play a crucial role in disease progression and epidemic spread. This article proposes and validates novel Monte Carlo (MC) methods for clustering aligned viral sequences by minimizing either entropy or Hamming distance from consensuses. We validate these methods on four benchmarks: two SARS-CoV-2 interhost data sets and two HIV intrahost data sets. A parallelized version of our tool is scalable to very large data sets. We show that both entropy and Hamming distance-based MC clusterings discern the meaningful information from sequencing data. The proposed clustering methods consistently converge to similar clusterings across different runs. Finally, we show that MC clustering improves reconstruction of intrahost viral population from sequencing data.
{"title":"Reconstruction of Viral Variants via Monte Carlo Clustering.","authors":"Akshay Juyal, Roya Hosseini, Daniel Novikov, Mark Grinshpon, Alex Zelikovsky","doi":"10.1089/cmb.2023.0154","DOIUrl":"10.1089/cmb.2023.0154","url":null,"abstract":"<p><p>Identifying viral variants through clustering is essential for understanding the composition and structure of viral populations within and between hosts, which play a crucial role in disease progression and epidemic spread. This article proposes and validates novel Monte Carlo (MC) methods for clustering aligned viral sequences by minimizing either entropy or Hamming distance from consensuses. We validate these methods on four benchmarks: two SARS-CoV-2 interhost data sets and two HIV intrahost data sets. A parallelized version of our tool is scalable to very large data sets. We show that both entropy and Hamming distance-based MC clusterings discern the meaningful information from sequencing data. The proposed clustering methods consistently converge to similar clusterings across different runs. Finally, we show that MC clustering improves reconstruction of intrahost viral population from sequencing data.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10518690/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10202955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01Epub Date: 2023-09-13DOI: 10.1089/cmb.2023.0135
Jiancheng Zhong, Pan Cui, Yihong Zhu, Qiu Xiao, Zuohang Qu
In the field of drug development and repositioning, the prediction of drug-disease associations is a critical task. A recently proposed method for predicting drug-disease associations based on graph convolution relies heavily on the features of adjacent nodes within the homogeneous network for characterizing information. However, this method lacks node attribute information from heterogeneous networks, which could hardly provide valuable insights for predicting drug-disease associations. In this study, a novel drug-disease association prediction model called DAHNGC is proposed, which is based on a graph convolutional neural network. This model includes two feature extraction methods that are specifically designed to extract the attribute characteristics of drugs and diseases from both homogeneous and heterogeneous networks. First, the DropEdge technique is added to the graph convolutional neural network to alleviate the oversmoothing problem and obtain the characteristics of the same nodes of drugs or diseases in the homogeneous network. Then, an automatic feature extraction method in the heterogeneous network is designed to obtain the features of drugs or diseases at different nodes. Finally, the obtained features are put into the fully connected network for nonlinear transformation, and the potential drug-disease pairs are obtained by bilinear decoding. Experimental results demonstrate that the DAHNGC model exhibits good predictive performance for drug-disease associations.
{"title":"DAHNGC: A Graph Convolution Model for Drug-Disease Association Prediction by Using Heterogeneous Network.","authors":"Jiancheng Zhong, Pan Cui, Yihong Zhu, Qiu Xiao, Zuohang Qu","doi":"10.1089/cmb.2023.0135","DOIUrl":"10.1089/cmb.2023.0135","url":null,"abstract":"<p><p>In the field of drug development and repositioning, the prediction of drug-disease associations is a critical task. A recently proposed method for predicting drug-disease associations based on graph convolution relies heavily on the features of adjacent nodes within the homogeneous network for characterizing information. However, this method lacks node attribute information from heterogeneous networks, which could hardly provide valuable insights for predicting drug-disease associations. In this study, a novel drug-disease association prediction model called DAHNGC is proposed, which is based on a graph convolutional neural network. This model includes two feature extraction methods that are specifically designed to extract the attribute characteristics of drugs and diseases from both homogeneous and heterogeneous networks. First, the DropEdge technique is added to the graph convolutional neural network to alleviate the oversmoothing problem and obtain the characteristics of the same nodes of drugs or diseases in the homogeneous network. Then, an automatic feature extraction method in the heterogeneous network is designed to obtain the features of drugs or diseases at different nodes. Finally, the obtained features are put into the fully connected network for nonlinear transformation, and the potential drug-disease pairs are obtained by bilinear decoding. Experimental results demonstrate that the DAHNGC model exhibits good predictive performance for drug-disease associations.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10590769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01Epub Date: 2023-09-14DOI: 10.1089/cmb.2023.0076
Liang Pan, Xia Xiao, Shengyun Liu, Shaoliang Peng
Drug-drug interaction (DDI) is a key concern in drug development and pharmacovigilance. It is important to improve DDI predictions by integrating multisource data from various pharmaceutical companies. Unfortunately, the data privacy and financial interest issues seriously influence the interinstitutional collaborations for DDI predictions. We propose multiparty computation DDI (MPCDDI), a secure MPC-based deep learning framework for DDI predictions. MPCDDI leverages the secret sharing technologies to incorporate the drug-related feature data from multiple institutions and develops a deep learning model for DDI predictions. In MPCDDI, all data transmission and deep learning operations are integrated into secure MPC frameworks to enable high-quality collaboration among pharmaceutical institutions without divulging private drug-related information. The results suggest that MPCDDI is superior to other eight baselines and achieves the similar performance to that of the corresponding plaintext collaborations. More interestingly, MPCDDI significantly outperforms methods that use private data from the single institution. In summary, MPCDDI is an effective framework for promoting collaborative and privacy-preserving drug discovery.
{"title":"An Integration Framework of Secure Multiparty Computation and Deep Neural Network for Improving Drug-Drug Interaction Predictions.","authors":"Liang Pan, Xia Xiao, Shengyun Liu, Shaoliang Peng","doi":"10.1089/cmb.2023.0076","DOIUrl":"10.1089/cmb.2023.0076","url":null,"abstract":"<p><p>Drug-drug interaction (DDI) is a key concern in drug development and pharmacovigilance. It is important to improve DDI predictions by integrating multisource data from various pharmaceutical companies. Unfortunately, the data privacy and financial interest issues seriously influence the interinstitutional collaborations for DDI predictions. We propose multiparty computation DDI (MPCDDI), a secure MPC-based deep learning framework for DDI predictions. MPCDDI leverages the secret sharing technologies to incorporate the drug-related feature data from multiple institutions and develops a deep learning model for DDI predictions. In MPCDDI, all data transmission and deep learning operations are integrated into secure MPC frameworks to enable high-quality collaboration among pharmaceutical institutions without divulging private drug-related information. The results suggest that MPCDDI is superior to other eight baselines and achieves the similar performance to that of the corresponding plaintext collaborations. More interestingly, MPCDDI significantly outperforms methods that use private data from the single institution. In summary, MPCDDI is an effective framework for promoting collaborative and privacy-preserving drug discovery.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10245038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Drug-drug interactions (DDIs) can have a significant impact on patient safety and health. Predicting potential DDIs before administering drugs to patients is a critical step in drug development and can help prevent adverse drug events. In this study, we propose a novel method called HF-DDI for predicting DDI events based on various drug features, including molecular structure, target, and enzyme information. Specifically, we design our model with both early fusion and late fusion strategies and utilize a score calculation module to predict the likelihood of interactions between drugs. Our model was trained and tested on a large data set of known DDIs, achieving an overall accuracy of 0.948. The results suggest that incorporating multiple drug features can improve the accuracy of DDI event prediction and may be useful for improving drug safety and patient outcomes.
{"title":"HF-DDI: Predicting Drug-Drug Interaction Events Based on Multimodal Hybrid Fusion.","authors":"An Huang, Xiaolan Xie, Xiaojun Yao, Huanxiang Liu, Xiaoqi Wang, Shaoliang Peng","doi":"10.1089/cmb.2023.0068","DOIUrl":"10.1089/cmb.2023.0068","url":null,"abstract":"<p><p>Drug-drug interactions (DDIs) can have a significant impact on patient safety and health. Predicting potential DDIs before administering drugs to patients is a critical step in drug development and can help prevent adverse drug events. In this study, we propose a novel method called HF-DDI for predicting DDI events based on various drug features, including molecular structure, target, and enzyme information. Specifically, we design our model with both early fusion and late fusion strategies and utilize a score calculation module to predict the likelihood of interactions between drugs. Our model was trained and tested on a large data set of known DDIs, achieving an overall accuracy of 0.948. The results suggest that incorporating multiple drug features can improve the accuracy of DDI event prediction and may be useful for improving drug safety and patient outcomes.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10024623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}