Screening peptides with good affinity is an important step in peptide-drug discovery. Recent advancement in computer and data science have made machine learning a useful tool in accurately affinitive-peptide screening. In current study, four different tree-based algorithms, including Classification and regression trees (CART), C5.0 decision tree (C50), Bagged CART (BAG) and Random Forest (RF), were employed to explore the relationship between experimental peptide affinities and virtual docking data, and the performance of each model was also compared in parallel. All four algorithms showed better performances on dataset pre-scaled, -centered and -PCA than other pre-processed dataset. After model re-built and hyperparameter optimization, the optimal C50 model (C50O) showed the best performances in terms of Accuracy, Kappa, Sensitivity, Specificity, F1, MCC and AUC when validated on test data and an unknown PEDV datasets evaluation (Accuracy=80.4 %). BAG and RFO (the optimal RF), as two best models during training process, did not performed as expecting during in testing and unknown dataset validations. Furthermore, the high correlation of the predictions of RFO and BAG to C50O implied the high stability and robustness of their prediction. Whereas although the good performance on unknown dataset, the poor performance in test data validation and correlation analysis indicated CARTO could not be used for future data prediction. To accurately evaluate the peptide affinity, the current study firstly gave a tree-model competition on affinitive peptide prediction by using virtual docking data, which would expand the application of machine learning algorithms in studying PepPIs and benefit the development of peptide therapeutics.
Agonists of the β2 adrenergic receptor (ADRB2) are an important class of medications used for the treatment of respiratory diseases. They can be classified as short acting (SABA) or long acting (LABA), with each class playing a different role in patient management. In this work we explored both ligand-based and structure-based high-throughput approaches to classify β2-agonists based on their duration of action. A completely in-silico prediction pipeline using an AlphaFold generated structure was used for structure-based modelling. Our analysis identified the ligands' 3D structure and lipophilicity as the most relevant features for the prediction of the duration of action. Interaction-based methods were also able to select ligands with the desired duration of action, incorporating the bias directly in the structure-based drug discovery pipeline without the need for further processing.
Kv2.1 is widely expressed in brain, and inhibiting Kv2.1 is a potential strategy to prevent cell death and achieve neuroprotection in ischemic stroke. Herein, an in silico model of Kv2.1 tetramer structure was constructed by employing the AlphaFold-Multimer deep learning method to facilitate the rational discovery of Kv2.1 inhibitors. GaMD was utilized to create an ion transporting trajectory, which was analyzed with HMM to generate multiple representative receptor conformations. The binding site of RY785 and RY796(S) under the P-loop was defined with Fpocket program together with the competitive binding electrophysiology assay. The docking poses of the two inhibitors were predicted with the aid of the semi-empirical quantum mechanical calculation, and the IGMH results suggested that Met375, Thr376, and Thr377 of the P-helix and Ile405 of the S6 segment made significant contributions to the binding affinity. These results provided insights for rational molecular design to develop novel Kv2.1 inhibitors.
Predicting the taste of molecules is of critical importance in the food and beverages, flavor, and pharmaceutical industries for the design and screening of new tastants. In this work, we have built deep learning models to classify sweet, bitter, and umami molecules- the three basic tastes whose sensation is mediated by G protein-coupled receptors. An extensive dataset containing 1466 bitter, 1764 sweet, and 238 umami tastants was curated from existing literature. We analyzed the chemical characteristics of the molecules, with special focus on the presence of different functional groups. A deep neural network model based on molecular descriptors and a graph neural network model were trained for taste prediction. The class imbalance due to fewer umami molecules was tackled using special sampling techniques. Both models show comparable performance during evaluation, but the graph-based model can learn task-specific representations from the molecular structure without requiring handcrafted features. We further explain the deep neural network predictions using Shapley additive explanations. Finally, we demonstrated the applicability of the models by screening bitter, sweet, and umami molecules from a large food database. This study develops an in-silico approach to classify molecules based on their taste by leveraging the recent progress in deep learning, which can serve as a powerful tool for tastant design.
Nowadays there are numerous discovered natural RNA variations participating in different cellular processes and artificial RNA, e. g., aptamers, riboswitches. One of the required tasks in the investigation of their functions and mechanism of influence on cells and interaction with targets is the prediction of RNA secondary structures. The classic thermodynamic-based prediction algorithms do not consider the specificity of biological folding and deep learning methods that were designed to resolve this issue suffer from homology-based methods problems. Herein, we present a method for RNA secondary structure prediction based on deep learning - AliNA (ALIgned Nucleic Acids). Our method successfully predicts secondary structures for non-homologous to train-data RNA families thanks to usage of the data augmentation techniques. Augmentation extends existing datasets with easily-accessible simulated data. The proposed method shows a high quality of prediction across different benchmarks including pseudoknots. The method is available on GitHub for free (https://github.com/Arty40m/AliNA).
Background: Despite tremendous efforts made by scientific community during the outbreak of COVID-19 pandemic, this disease still remains as a public health concern. Although different types of vaccines were globally used to reduce the mortality, emergence of new variants of SARS-CoV-2 is a challenging issue in COVID-19 pharmacotherapy. In this context, target therapy of SARS-CoV-2 by small ligands is a promising strategy.
Methods: In this investigation, we applied ligand-based virtual screening for finding novel molecules based on nirmatrelvir structure. Various criteria including drug-likeness, ADME, and toxicity properties were applied for filtering the compounds. The selected candidate molecules were subjected to molecular docking and dynamics simulation for predicting the binding mode and binding free energy, respectively. Then the molecules were experimentally evaluated in terms of antiviral activity against SARS-CoV-2 and toxicity assessment.
Results: The results demonstrated that the identified compounds showed inhibitory activity towards SARS-CoV-2 Mpro .
Conclusion: In summary, the introduced compounds may provide novel scaffold for further structural modification and optimization with improved anti SARS-CoV-2 Mpro activity.
Cell-Penetrating Peptides (CPP) are emerging as an alternative to small-molecule drugs to expand the range of biomolecules that can be targeted for therapeutic purposes. Due to the importance of identifying and designing new CPP, a great variety of predictors have been developed to achieve these goals. To establish a ranking for these predictors, a couple of recent studies compared their performances on specific datasets, yet their conclusions cannot determine if the ranking obtained is due to the model, the set of descriptors or the datasets used to test the predictors. We present a systematic study of the influence of the peptide sequence's similarity of the datasets on the predictors' performance. The analysis reveals that the datasets used for training have a stronger influence on the predictors performance than the model or descriptors employed. We show that datasets with low sequence similarity between the positive and negative examples can be easily separated, and the tested classifiers showed good performance on them. On the other hand, a dataset with high sequence similarity between CPP and non-CPP will be a hard dataset, and it should be the one to be used for assessing the performance of new predictors.
The multi-step retrosynthesis problem can be solved by a search algorithm, such as Monte Carlo tree search (MCTS). The performance of multistep retrosynthesis, as measured by a trade-off in search time and route solvability, therefore depends on the hyperparameters of the search algorithm. In this paper, we demonstrated the effect of three MCTS hyperparameters (number of iterations, tree depth, and tree width) on metrics such as Linear integrated speed-accuracy score (LISAS) and Inverse efficiency score which consider both route solvability and search time. This exploration was conducted by employing three data-driven approaches, namely a systematic grid search, Bayesian optimization over an ensemble of molecules to obtain static MCTS hyperparameters, and a machine learning approach to dynamically predict optimal MCTS hyperparameters given an input target molecule. With the obtained results on the internal dataset, we demonstrated that it is possible to identify a hyperparameter set which outperforms the current AiZynthFinder default setting. It appeared optimal across a variety of target input molecules, both on proprietary and public datasets. The settings identified with the in-house dataset reached a solvability of 93 % and median search time of 151 s for the in-house dataset, and a 74 % solvability and 114 s for the ChEMBL dataset. These numbers can be compared to the current default settings which solved 85 % and 73 % during a median time of 110s and 84 s, for in-house and ChEMBL, respectively.