Alicja Gawalska, Natalia Czub, Michał Sapa, Marcin Kołaczkowski, Adam Bucki, Aleksander Mendyk
Asthma and COPD are characterized by complex pathophysiology associated with chronic inflammation, bronchoconstriction, and bronchial hyperresponsiveness resulting in airway remodeling. A possible comprehensive solution that could fully counteract the pathological processes of both diseases are rationally designed multi-target-directed ligands (MTDLs), combining PDE4B and PDE8A inhibition with TRPA1 blockade. The aim of the study was to develop AutoML models to search for novel MTDL chemotypes blocking PDE4B, PDE8A, and TRPA1. Regression models were developed for each of the biological targets using "mljar-supervised". On their basis, virtual screenings of commercially available compounds derived from the ZINC15 database were performed. A common group of compounds placed within the top results was selected as potential novel chemotypes of multifunctional ligands. This study represents the first attempt to discover the potential MTDLs inhibiting three biological targets. The obtained results prove the usefulness of AutoML methodology in the identification of hits from the big compound databases.
{"title":"Application of automated machine learning in the identification of multi-target-directed ligands blocking PDE4B, PDE8A, and TRPA1 with potential use in the treatment of asthma and COPD.","authors":"Alicja Gawalska, Natalia Czub, Michał Sapa, Marcin Kołaczkowski, Adam Bucki, Aleksander Mendyk","doi":"10.1002/minf.202200214","DOIUrl":"https://doi.org/10.1002/minf.202200214","url":null,"abstract":"<p><p>Asthma and COPD are characterized by complex pathophysiology associated with chronic inflammation, bronchoconstriction, and bronchial hyperresponsiveness resulting in airway remodeling. A possible comprehensive solution that could fully counteract the pathological processes of both diseases are rationally designed multi-target-directed ligands (MTDLs), combining PDE4B and PDE8A inhibition with TRPA1 blockade. The aim of the study was to develop AutoML models to search for novel MTDL chemotypes blocking PDE4B, PDE8A, and TRPA1. Regression models were developed for each of the biological targets using \"mljar-supervised\". On their basis, virtual screenings of commercially available compounds derived from the ZINC15 database were performed. A common group of compounds placed within the top results was selected as potential novel chemotypes of multifunctional ligands. This study represents the first attempt to discover the potential MTDLs inhibiting three biological targets. The obtained results prove the usefulness of AutoML methodology in the identification of hits from the big compound databases.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"42 7","pages":"e2200214"},"PeriodicalIF":3.6,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9796310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Drug-induced liver injury (DILI) is one of the major causes of drug withdrawals, acute liver injury and blackbox warnings. Clinical diagnosis of DILI is a huge challenge due to the complex pathogenesis and lack of specific biomarkers. In recent years, machine learning methods have been used for DILI risk assessment, but the model generalization does not perform satisfactorily. In this study, we constructed a large DILI data set and proposed an integration strategy based on hybrid representations for DILI prediction (HR-DILI). Benefited from feature integration, the hybrid graph neural network models outperformed single representation-based models, among which hybrid-GraphSAGE showed balanced performance in cross-validation with AUC (area under the curve) as 0.804±0.019. In the external validation set, HR-DILI improved the AUC by 6.4 %-35.9 % compared to the base model with a single representation. Compared with published DILI prediction models, HR-DILI had better and balanced performance. The performance of local models for natural products and synthetic compounds were also explored. Furthermore, eight key descriptors and six structural alerts associated with DILI were analyzed to increase the interpretability of the models. The improved performance of HR-DILI indicated that it would provide reliable guidance for DILI risk assessment.
{"title":"In silico prediction of drug-induced liver injury with a complementary integration strategy based on hybrid representation.","authors":"Yaxin Gu, Yimeng Wang, Zengrui Wu, Weihua Li, Guixia Liu, Yun Tang","doi":"10.1002/minf.202200284","DOIUrl":"https://doi.org/10.1002/minf.202200284","url":null,"abstract":"<p><p>Drug-induced liver injury (DILI) is one of the major causes of drug withdrawals, acute liver injury and blackbox warnings. Clinical diagnosis of DILI is a huge challenge due to the complex pathogenesis and lack of specific biomarkers. In recent years, machine learning methods have been used for DILI risk assessment, but the model generalization does not perform satisfactorily. In this study, we constructed a large DILI data set and proposed an integration strategy based on hybrid representations for DILI prediction (HR-DILI). Benefited from feature integration, the hybrid graph neural network models outperformed single representation-based models, among which hybrid-GraphSAGE showed balanced performance in cross-validation with AUC (area under the curve) as 0.804±0.019. In the external validation set, HR-DILI improved the AUC by 6.4 %-35.9 % compared to the base model with a single representation. Compared with published DILI prediction models, HR-DILI had better and balanced performance. The performance of local models for natural products and synthetic compounds were also explored. Furthermore, eight key descriptors and six structural alerts associated with DILI were analyzed to increase the interpretability of the models. The improved performance of HR-DILI indicated that it would provide reliable guidance for DILI risk assessment.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"42 7","pages":"e2200284"},"PeriodicalIF":3.6,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9849638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Timothy B Dunn, Edgar López-López, Taewon David Kim, José L Medina-Franco, Ramón Alain Miranda-Quintana
Understanding structure-activity landscapes is essential in drug discovery. Similarly, it has been shown that the presence of activity cliffs in compound data sets can have a substantial impact not only on the design progress but also can influence the predictive ability of machine learning models. With the continued expansion of the chemical space and the currently available large and ultra-large libraries, it is imperative to implement efficient tools to analyze the activity landscape of compound data sets rapidly. The goal of this study is to show the applicability of the n-ary indices to quantify the structure-activity landscapes of large compound data sets using different types of structural representation rapidly and efficiently. We also discuss how a recently introduced medoid algorithm provides the foundation to finding optimum correlations between similarity measures and structure-activity rankings. The applicability of the n-ary indices and the medoid algorithm is shown by analyzing the activity landscape of 10 compound data sets with pharmaceutical relevance using three fingerprints of different designs, 16 extended similarity indices, and 11 coincidence thresholds.
{"title":"Exploring activity landscapes with extended similarity: is Tanimoto enough?","authors":"Timothy B Dunn, Edgar López-López, Taewon David Kim, José L Medina-Franco, Ramón Alain Miranda-Quintana","doi":"10.1002/minf.202300056","DOIUrl":"https://doi.org/10.1002/minf.202300056","url":null,"abstract":"<p><p>Understanding structure-activity landscapes is essential in drug discovery. Similarly, it has been shown that the presence of activity cliffs in compound data sets can have a substantial impact not only on the design progress but also can influence the predictive ability of machine learning models. With the continued expansion of the chemical space and the currently available large and ultra-large libraries, it is imperative to implement efficient tools to analyze the activity landscape of compound data sets rapidly. The goal of this study is to show the applicability of the n-ary indices to quantify the structure-activity landscapes of large compound data sets using different types of structural representation rapidly and efficiently. We also discuss how a recently introduced medoid algorithm provides the foundation to finding optimum correlations between similarity measures and structure-activity rankings. The applicability of the n-ary indices and the medoid algorithm is shown by analyzing the activity landscape of 10 compound data sets with pharmaceutical relevance using three fingerprints of different designs, 16 extended similarity indices, and 11 coincidence thresholds.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"42 7","pages":"e2300056"},"PeriodicalIF":3.6,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9794062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gopi Mohan C, Anju Pushkaran, Kumaran K, Ann MariaT, Raja Biswas
PD-1/PD-L1 is a critical druggable target for immunotherapy against sepsis. Chemoinformatics techniques involved the structure-based 3D pharmacophore model development followed by virtual screening of small molecule databases to identify the small molecules against PD-L1 pathway inhibition. Raltitrexed and Safinamide act as potent repurposed drugs, and three other Specs database compounds using in silico methods. These compounds were screened based on the pharmacophore fit score and binding affinity towards the active site of the PD-L1 protein. In silico pharmacokinetic profiling of these screened compounds was done to test their biological activity. Next, experimental validation of the best four virtually screened hits was done in vitro for its hemocompatibility and cytotoxicity. Among these, Raltitrexed, Safinamide and Specs compound (AK-968/40642641) effectively increased the proliferation of immune cells and IFN-γ production. These compounds can act as potent PDL-1 inhibitors for adjuvant therapy against sepsis.
{"title":"Identification of a PD1/PD-L1 inhibitor by structure-based pharmacophore modelling, virtual screening, molecular docking and biological evaluation.","authors":"Gopi Mohan C, Anju Pushkaran, Kumaran K, Ann MariaT, Raja Biswas","doi":"10.1002/minf.202200254","DOIUrl":"https://doi.org/10.1002/minf.202200254","url":null,"abstract":"<p><p>PD-1/PD-L1 is a critical druggable target for immunotherapy against sepsis. Chemoinformatics techniques involved the structure-based 3D pharmacophore model development followed by virtual screening of small molecule databases to identify the small molecules against PD-L1 pathway inhibition. Raltitrexed and Safinamide act as potent repurposed drugs, and three other Specs database compounds using in silico methods. These compounds were screened based on the pharmacophore fit score and binding affinity towards the active site of the PD-L1 protein. In silico pharmacokinetic profiling of these screened compounds was done to test their biological activity. Next, experimental validation of the best four virtually screened hits was done in vitro for its hemocompatibility and cytotoxicity. Among these, Raltitrexed, Safinamide and Specs compound (AK-968/40642641) effectively increased the proliferation of immune cells and IFN-γ production. These compounds can act as potent PDL-1 inhibitors for adjuvant therapy against sepsis.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"42 6","pages":"e2200254"},"PeriodicalIF":3.6,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9680278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Several binary molecular fingerprints were compressed using an autoencoder neural network. We analyzed the impact of compression on fingerprint performance in downstream classification and regression tasks. Classifiers trained on compressed fingerprints were negligibly affected. Regression models benefitted from compression, especially of long fingerprints (Morgan, RDK). However, their performance dropped rapidly for compression levels exceeding 90 %. Property co-learning positively influenced the predictive power of the compressed fingerprints, with a mean score improvement up to 20 %, suggesting that autoencoder compression with property co-learning biases the molecular representation toward the predicted target, facilitating downstream training.
{"title":"Compression of molecular fingerprints with autoencoder networks.","authors":"Gisbert Schneider, Agnieszka Ilnicka","doi":"10.1002/minf.202300059","DOIUrl":"https://doi.org/10.1002/minf.202300059","url":null,"abstract":"<p><p>Several binary molecular fingerprints were compressed using an autoencoder neural network. We analyzed the impact of compression on fingerprint performance in downstream classification and regression tasks. Classifiers trained on compressed fingerprints were negligibly affected. Regression models benefitted from compression, especially of long fingerprints (Morgan, RDK). However, their performance dropped rapidly for compression levels exceeding 90 %. Property co-learning positively influenced the predictive power of the compressed fingerprints, with a mean score improvement up to 20 %, suggesting that autoencoder compression with property co-learning biases the molecular representation toward the predicted target, facilitating downstream training.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"42 6","pages":"e2300059"},"PeriodicalIF":3.6,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9681391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luis A García-González, Yovani Marrero-Ponce, Carlos A Brizuela, César R García-Jacas
Predicting the likely biological activity (or property) of compounds is a fundamental and challenging task in the drug discovery process. Current computational methodologies aim to improve their predictive accuracies by using deep learning (DL) approaches. However, non-DL based approaches for small- and medium-sized chemical datasets have demonstrated to be most suitable for. In this approach, an initial universe of molecular descriptors (MDs) is first calculated, then different feature selection algorithms are applied, and finally, one or several predictive models are built. Herein we demonstrate that this traditional approach may miss relevant information by assuming that the initial universe of MDs codifies all relevant aspects for the respective learning task. We argue that this limitation is mainly because of the constrained intervals of the parameters used in the algorithms that compute MDs, parameters that define the Descriptor Configuration Space (DCS). We propose to relax these constraints in an open CDS approach, so that a larger universe of MDs can be initially considered. We model the generation of MDs as a multicriteria optimization problem and tackle it with a variant of the standard genetic algorithm. As a novel component, the fitness function is computed by aggregating four criteria via the Choquet integral. Experimental results show that the proposed approach generates a meaningful DCS by improving state-of-the-art approaches in most of the benchmarking chemical datasets accounted for.
{"title":"Overproduce and select, or determine optimal molecular descriptor subset via configuration space optimization? Application to the prediction of ecotoxicological endpoints.","authors":"Luis A García-González, Yovani Marrero-Ponce, Carlos A Brizuela, César R García-Jacas","doi":"10.1002/minf.202200227","DOIUrl":"https://doi.org/10.1002/minf.202200227","url":null,"abstract":"<p><p>Predicting the likely biological activity (or property) of compounds is a fundamental and challenging task in the drug discovery process. Current computational methodologies aim to improve their predictive accuracies by using deep learning (DL) approaches. However, non-DL based approaches for small- and medium-sized chemical datasets have demonstrated to be most suitable for. In this approach, an initial universe of molecular descriptors (MDs) is first calculated, then different feature selection algorithms are applied, and finally, one or several predictive models are built. Herein we demonstrate that this traditional approach may miss relevant information by assuming that the initial universe of MDs codifies all relevant aspects for the respective learning task. We argue that this limitation is mainly because of the constrained intervals of the parameters used in the algorithms that compute MDs, parameters that define the Descriptor Configuration Space (DCS). We propose to relax these constraints in an open CDS approach, so that a larger universe of MDs can be initially considered. We model the generation of MDs as a multicriteria optimization problem and tackle it with a variant of the standard genetic algorithm. As a novel component, the fitness function is computed by aggregating four criteria via the Choquet integral. Experimental results show that the proposed approach generates a meaningful DCS by improving state-of-the-art approaches in most of the benchmarking chemical datasets accounted for.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"42 6","pages":"e2200227"},"PeriodicalIF":3.6,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9682498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jia-Xi Chang, Jian-Wei Zou, Chao-Yuan Lou, Jia-Xin Ye, Rui Feng, Zi-Yuan Li, Gui-Xiang Hu
The present work was devoted to explore the quantitative structure-property relationships for gas-to-ionic liquid partition coefficients (log KILA ). A series of linear models were first established for the representative dataset (IL01). The optimal model was a four-parameter equation (1Ed) consisting of two electrostatic potential-based descriptors ( and Vs,max ), one 2D matrix-based descriptor (J_D/Dt) and dipole moment (μ). All of the four descriptors introduced in the model can find the corresponding parameters, directly or indirectly, from Abraham's linear solvation energy relationship (LSER) or its theoretical alternatives, which endows the model good interpretability. Gaussian process was utilized to build the nonlinear model. Systematical validations, including 5-fold cross-validation for the training set, the validation for test set, as well as a more rigorous Monte Carlo cross-validation were performed to verify the reliability of the constructed models. Applicability domain of the model was evaluated, and the Williams plot revealed that the model can be used to predict the log KILA values of structurally diverse solutes. The other 13 datasets were also processed in the same way, and all of the linear models with expressions similar to equation 1Ed were obtained. These models, whether linear of nonlinear, represent satisfactory statistical results, which confirms the universality of the method adopted in this study in QSPR modeling of gas-to-IL partition.
本工作致力于探索气体-离子液体分配系数(log KILA)的定量结构-性质关系。首先针对代表性数据集(IL01)建立了一系列线性模型。最优模型是由两个基于静电电位的描述子(Σ Vs, ind - ${{rm { Sigma}}{V}_{s,ind}^{-}}$和V,max)、一个基于二维矩阵的描述子(J_D/Dt)和偶极矩(μ)组成的四参数方程(1Ed)。模型中引入的四种描述符都可以直接或间接地从亚伯拉罕的线性溶剂化能关系(LSER)或其理论替代中找到相应的参数,这赋予了模型良好的可解释性。采用高斯过程建立非线性模型。系统验证,包括对训练集的5倍交叉验证,对测试集的验证,以及更严格的蒙特卡罗交叉验证,以验证所构建模型的可靠性。对模型的适用范围进行了评估,Williams图显示该模型可用于预测结构不同的溶质的对数KILA值。对其余13个数据集也进行同样的处理,得到的线性模型均与方程1Ed相似。这些模型,无论是线性的还是非线性的,都代表了令人满意的统计结果,这证实了本研究采用的方法在气-油划分QSPR建模中的通用性。
{"title":"Gas-to-ionic liquid partition: QSPR modeling and mechanistic interpretation.","authors":"Jia-Xi Chang, Jian-Wei Zou, Chao-Yuan Lou, Jia-Xin Ye, Rui Feng, Zi-Yuan Li, Gui-Xiang Hu","doi":"10.1002/minf.202200223","DOIUrl":"https://doi.org/10.1002/minf.202200223","url":null,"abstract":"<p><p>The present work was devoted to explore the quantitative structure-property relationships for gas-to-ionic liquid partition coefficients (log K<sub>ILA</sub> ). A series of linear models were first established for the representative dataset (IL01). The optimal model was a four-parameter equation (1Ed) consisting of two electrostatic potential-based descriptors ( <math> <semantics><mrow><mi>Σ</mi> <msubsup><mi>V</mi> <mrow><mi>s</mi> <mo>,</mo> <mi>i</mi> <mi>n</mi> <mi>d</mi></mrow> <mo>-</mo></msubsup> </mrow> <annotation>${{rm { Sigma }}{V}_{s,ind}^{-}}$</annotation> </semantics> </math> and V<sub>s,max</sub> ), one 2D matrix-based descriptor (J_D/Dt) and dipole moment (μ). All of the four descriptors introduced in the model can find the corresponding parameters, directly or indirectly, from Abraham's linear solvation energy relationship (LSER) or its theoretical alternatives, which endows the model good interpretability. Gaussian process was utilized to build the nonlinear model. Systematical validations, including 5-fold cross-validation for the training set, the validation for test set, as well as a more rigorous Monte Carlo cross-validation were performed to verify the reliability of the constructed models. Applicability domain of the model was evaluated, and the Williams plot revealed that the model can be used to predict the log K<sub>ILA</sub> values of structurally diverse solutes. The other 13 datasets were also processed in the same way, and all of the linear models with expressions similar to equation 1Ed were obtained. These models, whether linear of nonlinear, represent satisfactory statistical results, which confirms the universality of the method adopted in this study in QSPR modeling of gas-to-IL partition.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"42 6","pages":"e2200223"},"PeriodicalIF":3.6,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10056650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amenah M Al-Imam, Safa Daoud, Ma'mon M Hatmal, Mutasem Omar Taha
Dual specificity protein kinase threonine/Tyrosine kinase (TTK) is one of the mitotic kinases. High levels of TTK are detected in several types of cancer. Hence, TTK inhibition is considered a promising therapeutic anti-cancer strategy. In this work, we used multiple docked poses of TTK inhibitors to augment training data for machine learning QSAR modeling. Ligand-Receptor Contacts Fingerprints and docking scoring values were used as descriptor variables. Escalating docking-scoring consensus levels were scanned against orthogonal machine learners, and the best learners (Random Forests and XGBoost) were coupled with genetic algorithm and Shapley additive explanations (SHAP) to determine critical descriptors for predicting anti-TTK bioactivity and for pharmacophore generation. Three successful pharmacophores were deduced and subsequently used for in silico screening against the NCI database. A total of 14 hits were evaluated in vitro for their anti-TTK bioactivities. One hit of novel chemotype showed reasonable dose-response curve with experimental IC50 of 1.0 μM. The presented work indicates the validity of data augmentation using multiple docked poses for building successful machine learning models and pharmacophore hypotheses.
{"title":"Augmenting bioactivity by docking-generated multiple ligand poses to enhance machine learning and pharmacophore modelling: discovery of new TTK inhibitors as case study.","authors":"Amenah M Al-Imam, Safa Daoud, Ma'mon M Hatmal, Mutasem Omar Taha","doi":"10.1002/minf.202300022","DOIUrl":"https://doi.org/10.1002/minf.202300022","url":null,"abstract":"<p><p>Dual specificity protein kinase threonine/Tyrosine kinase (TTK) is one of the mitotic kinases. High levels of TTK are detected in several types of cancer. Hence, TTK inhibition is considered a promising therapeutic anti-cancer strategy. In this work, we used multiple docked poses of TTK inhibitors to augment training data for machine learning QSAR modeling. Ligand-Receptor Contacts Fingerprints and docking scoring values were used as descriptor variables. Escalating docking-scoring consensus levels were scanned against orthogonal machine learners, and the best learners (Random Forests and XGBoost) were coupled with genetic algorithm and Shapley additive explanations (SHAP) to determine critical descriptors for predicting anti-TTK bioactivity and for pharmacophore generation. Three successful pharmacophores were deduced and subsequently used for in silico screening against the NCI database. A total of 14 hits were evaluated in vitro for their anti-TTK bioactivities. One hit of novel chemotype showed reasonable dose-response curve with experimental IC<sub>50</sub> of 1.0 μM. The presented work indicates the validity of data augmentation using multiple docked poses for building successful machine learning models and pharmacophore hypotheses.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"42 6","pages":"e2300022"},"PeriodicalIF":3.6,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9675061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mael A Briand, Loïc Dreano, Ashenafi Legehar, Evgeni Grazhdankin, Leo Ghemtio, Henri Xhaard
Cooperative molecular contacts play an important role in protein structure and ligand binding. Here, we constructed a PostgreSQL database that stores structural information in the form of atomic environments and allows flexible mining of molecular contacts. Taking the Ser-His-Asp/Glu catalytic triad as a first test case, we demonstrate that the presence of a carboxylate oxygen atom in the vicinity of a His is associated with shorter Ser-OH..N-His bond in the PDB30 subset. We prospectively mine catalytic triads in unannotated proteins, suggesting catalytic functions for unannotated proteins. As a second test case, we demonstrate that this database system can include ligand atoms, represented by Sybyl atom types, by evaluating the proportion of counter-ions for ligand carboxylate oxygens.
{"title":"Exploring cooperative molecular contacts using a PostgreSQL database system.","authors":"Mael A Briand, Loïc Dreano, Ashenafi Legehar, Evgeni Grazhdankin, Leo Ghemtio, Henri Xhaard","doi":"10.1002/minf.202200235","DOIUrl":"https://doi.org/10.1002/minf.202200235","url":null,"abstract":"<p><p>Cooperative molecular contacts play an important role in protein structure and ligand binding. Here, we constructed a PostgreSQL database that stores structural information in the form of atomic environments and allows flexible mining of molecular contacts. Taking the Ser-His-Asp/Glu catalytic triad as a first test case, we demonstrate that the presence of a carboxylate oxygen atom in the vicinity of a His is associated with shorter Ser-OH..N-His bond in the PDB30 subset. We prospectively mine catalytic triads in unannotated proteins, suggesting catalytic functions for unannotated proteins. As a second test case, we demonstrate that this database system can include ligand atoms, represented by Sybyl atom types, by evaluating the proportion of counter-ions for ligand carboxylate oxygens.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"42 5","pages":"e2200235"},"PeriodicalIF":3.6,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9457184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The toxicity of compounds is closely related to the effectiveness and safety of drug development, and accurately predicting the toxicity of compounds is one of the most challenging tasks in medicinal chemistry and pharmacology. In this paper, we construct three types of models for single and multi-tasking based on 2D and 3D descriptors, fingerprints and molecular graphs, and then validate the models with benchmark tests on the Tox21 data challenge. We found that due to the information sharing mechanism of multi-task learning, it could address the imbalance problem of the Tox21 data sets to some extent, and the prediction performance of the multi-task was significantly improved compared with the single task in general. Given the complement of the different molecular representations and modeling algorithms, we attempted to integrate them into a robust Co-Model. Our Co-Model performs well in various evaluation metrics on the test set and also achieves significant performance improvement compared to other models in the literature, which clearly demonstrates its superior predictive power and robustness.
{"title":"Co-model for chemical toxicity prediction based on multi-task deep learning.","authors":"Yuan Yuan Li, Lingfeng Chen, Chengtao Pu, Chengdong Zang, YingChao Yan, Yadong Chen, Yanmin Zhang, Haichun Liu","doi":"10.1002/minf.202200257","DOIUrl":"https://doi.org/10.1002/minf.202200257","url":null,"abstract":"<p><p>The toxicity of compounds is closely related to the effectiveness and safety of drug development, and accurately predicting the toxicity of compounds is one of the most challenging tasks in medicinal chemistry and pharmacology. In this paper, we construct three types of models for single and multi-tasking based on 2D and 3D descriptors, fingerprints and molecular graphs, and then validate the models with benchmark tests on the Tox21 data challenge. We found that due to the information sharing mechanism of multi-task learning, it could address the imbalance problem of the Tox21 data sets to some extent, and the prediction performance of the multi-task was significantly improved compared with the single task in general. Given the complement of the different molecular representations and modeling algorithms, we attempted to integrate them into a robust Co-Model. Our Co-Model performs well in various evaluation metrics on the test set and also achieves significant performance improvement compared to other models in the literature, which clearly demonstrates its superior predictive power and robustness.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"42 5","pages":"e2200257"},"PeriodicalIF":3.6,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9510308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}