首页 > 最新文献

Molecular Informatics最新文献

英文 中文
The VEGA web service: multipurpose online tools for molecular modelling and docking analyses. VEGA网络服务:用于分子建模和对接分析的多用途在线工具。
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2023-07-01 DOI: 10.1002/minf.202300018
Alessandro Pedretti, Serena Vittorio, Emanuela Sabato, Giulio Vistoli, Angelica Mazzolari

The paper presents the VEGA Online web service, which includes a set of freely available tools deriving from the development of the VEGA suite of programs. In detail, the paper is focused on two tools: the VEGA Web Edition (WE) and the Score tool. The former is a versatile file format converter including relevant features for 2D/3D conversion, for surface mapping and for editing/preparing input files. The Score application allows rescoring docking poses and in particular includes the MLP Interactions Scores (MLPInS) for describing hydrophobic interactions. To the best of our knowledge, this web service is the only available resource by which one can calculate both the virtual log P of a given input molecule according to the MLP approach plus the corresponding MLP surface.

本文介绍了VEGA在线web服务,其中包括一套源自VEGA程序套件开发的免费工具。具体来说,本文重点介绍了两个工具:VEGA Web Edition (WE)和Score工具。前者是一个通用的文件格式转换器,包括2D/3D转换、表面映射和编辑/准备输入文件的相关功能。Score应用程序允许对对接姿势进行评分,特别是包括MLP相互作用分数(mlpin),用于描述疏水相互作用。据我们所知,这个web服务是唯一可用的资源,通过它可以根据MLP方法和相应的MLP表面计算给定输入分子的虚拟log P。
{"title":"The VEGA web service: multipurpose online tools for molecular modelling and docking analyses.","authors":"Alessandro Pedretti,&nbsp;Serena Vittorio,&nbsp;Emanuela Sabato,&nbsp;Giulio Vistoli,&nbsp;Angelica Mazzolari","doi":"10.1002/minf.202300018","DOIUrl":"https://doi.org/10.1002/minf.202300018","url":null,"abstract":"<p><p>The paper presents the VEGA Online web service, which includes a set of freely available tools deriving from the development of the VEGA suite of programs. In detail, the paper is focused on two tools: the VEGA Web Edition (WE) and the Score tool. The former is a versatile file format converter including relevant features for 2D/3D conversion, for surface mapping and for editing/preparing input files. The Score application allows rescoring docking poses and in particular includes the MLP Interactions Scores (MLPInS) for describing hydrophobic interactions. To the best of our knowledge, this web service is the only available resource by which one can calculate both the virtual log P of a given input molecule according to the MLP approach plus the corresponding MLP surface.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9790706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application of automated machine learning in the identification of multi-target-directed ligands blocking PDE4B, PDE8A, and TRPA1 with potential use in the treatment of asthma and COPD. 自动机器学习在识别阻断PDE4B、PDE8A和TRPA1的多靶点定向配体中的应用,在哮喘和COPD治疗中的潜在应用
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2023-07-01 DOI: 10.1002/minf.202200214
Alicja Gawalska, Natalia Czub, Michał Sapa, Marcin Kołaczkowski, Adam Bucki, Aleksander Mendyk

Asthma and COPD are characterized by complex pathophysiology associated with chronic inflammation, bronchoconstriction, and bronchial hyperresponsiveness resulting in airway remodeling. A possible comprehensive solution that could fully counteract the pathological processes of both diseases are rationally designed multi-target-directed ligands (MTDLs), combining PDE4B and PDE8A inhibition with TRPA1 blockade. The aim of the study was to develop AutoML models to search for novel MTDL chemotypes blocking PDE4B, PDE8A, and TRPA1. Regression models were developed for each of the biological targets using "mljar-supervised". On their basis, virtual screenings of commercially available compounds derived from the ZINC15 database were performed. A common group of compounds placed within the top results was selected as potential novel chemotypes of multifunctional ligands. This study represents the first attempt to discover the potential MTDLs inhibiting three biological targets. The obtained results prove the usefulness of AutoML methodology in the identification of hits from the big compound databases.

哮喘和COPD具有复杂的病理生理特征,与慢性炎症、支气管收缩和支气管高反应性相关,导致气道重塑。合理设计多靶标定向配体(multi-target-directed ligands, mtdl),将抑制PDE4B和PDE8A与阻断TRPA1相结合,可能是全面对抗这两种疾病病理过程的综合解决方案。该研究的目的是开发AutoML模型,以寻找阻断PDE4B、PDE8A和TRPA1的新型MTDL化学型。使用“mljar-supervised”对每个生物靶点建立回归模型。在此基础上,对来自ZINC15数据库的市售化合物进行虚拟筛选。放置在顶部结果中的一组常见化合物被选为潜在的多功能配体的新型化学型。本研究首次尝试发现潜在的mtdl抑制三种生物靶点。所得结果证明了AutoML方法在大型复合数据库命中识别中的有效性。
{"title":"Application of automated machine learning in the identification of multi-target-directed ligands blocking PDE4B, PDE8A, and TRPA1 with potential use in the treatment of asthma and COPD.","authors":"Alicja Gawalska,&nbsp;Natalia Czub,&nbsp;Michał Sapa,&nbsp;Marcin Kołaczkowski,&nbsp;Adam Bucki,&nbsp;Aleksander Mendyk","doi":"10.1002/minf.202200214","DOIUrl":"https://doi.org/10.1002/minf.202200214","url":null,"abstract":"<p><p>Asthma and COPD are characterized by complex pathophysiology associated with chronic inflammation, bronchoconstriction, and bronchial hyperresponsiveness resulting in airway remodeling. A possible comprehensive solution that could fully counteract the pathological processes of both diseases are rationally designed multi-target-directed ligands (MTDLs), combining PDE4B and PDE8A inhibition with TRPA1 blockade. The aim of the study was to develop AutoML models to search for novel MTDL chemotypes blocking PDE4B, PDE8A, and TRPA1. Regression models were developed for each of the biological targets using \"mljar-supervised\". On their basis, virtual screenings of commercially available compounds derived from the ZINC15 database were performed. A common group of compounds placed within the top results was selected as potential novel chemotypes of multifunctional ligands. This study represents the first attempt to discover the potential MTDLs inhibiting three biological targets. The obtained results prove the usefulness of AutoML methodology in the identification of hits from the big compound databases.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9796310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
In silico prediction of drug-induced liver injury with a complementary integration strategy based on hybrid representation. 基于混合表示的互补整合策略的药物性肝损伤的计算机预测。
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2023-07-01 DOI: 10.1002/minf.202200284
Yaxin Gu, Yimeng Wang, Zengrui Wu, Weihua Li, Guixia Liu, Yun Tang

Drug-induced liver injury (DILI) is one of the major causes of drug withdrawals, acute liver injury and blackbox warnings. Clinical diagnosis of DILI is a huge challenge due to the complex pathogenesis and lack of specific biomarkers. In recent years, machine learning methods have been used for DILI risk assessment, but the model generalization does not perform satisfactorily. In this study, we constructed a large DILI data set and proposed an integration strategy based on hybrid representations for DILI prediction (HR-DILI). Benefited from feature integration, the hybrid graph neural network models outperformed single representation-based models, among which hybrid-GraphSAGE showed balanced performance in cross-validation with AUC (area under the curve) as 0.804±0.019. In the external validation set, HR-DILI improved the AUC by 6.4 %-35.9 % compared to the base model with a single representation. Compared with published DILI prediction models, HR-DILI had better and balanced performance. The performance of local models for natural products and synthetic compounds were also explored. Furthermore, eight key descriptors and six structural alerts associated with DILI were analyzed to increase the interpretability of the models. The improved performance of HR-DILI indicated that it would provide reliable guidance for DILI risk assessment.

药物性肝损伤(DILI)是引起停药、急性肝损伤和黑盒警告的主要原因之一。由于其复杂的发病机制和缺乏特异性的生物标志物,DILI的临床诊断是一个巨大的挑战。近年来,机器学习方法被用于DILI风险评估,但模型泛化效果不理想。本研究构建了一个大型DILI数据集,并提出了一种基于混合表示的DILI预测集成策略(HR-DILI)。得益于特征集成,混合图神经网络模型优于基于单一表示的模型,其中hybrid- graphsage在交叉验证中表现均衡,AUC(曲线下面积)为0.804±0.019。在外部验证集中,HR-DILI比具有单一表示的基本模型提高了6.4% - 35.9%的AUC。与已发表的DILI预测模型相比,HR-DILI具有更好的平衡性能。对天然产物和合成化合物的局部模型的性能也进行了探讨。此外,分析了与DILI相关的8个关键描述符和6个结构警报,以提高模型的可解释性。HR-DILI的改进表明其可为DILI风险评估提供可靠的指导。
{"title":"In silico prediction of drug-induced liver injury with a complementary integration strategy based on hybrid representation.","authors":"Yaxin Gu,&nbsp;Yimeng Wang,&nbsp;Zengrui Wu,&nbsp;Weihua Li,&nbsp;Guixia Liu,&nbsp;Yun Tang","doi":"10.1002/minf.202200284","DOIUrl":"https://doi.org/10.1002/minf.202200284","url":null,"abstract":"<p><p>Drug-induced liver injury (DILI) is one of the major causes of drug withdrawals, acute liver injury and blackbox warnings. Clinical diagnosis of DILI is a huge challenge due to the complex pathogenesis and lack of specific biomarkers. In recent years, machine learning methods have been used for DILI risk assessment, but the model generalization does not perform satisfactorily. In this study, we constructed a large DILI data set and proposed an integration strategy based on hybrid representations for DILI prediction (HR-DILI). Benefited from feature integration, the hybrid graph neural network models outperformed single representation-based models, among which hybrid-GraphSAGE showed balanced performance in cross-validation with AUC (area under the curve) as 0.804±0.019. In the external validation set, HR-DILI improved the AUC by 6.4 %-35.9 % compared to the base model with a single representation. Compared with published DILI prediction models, HR-DILI had better and balanced performance. The performance of local models for natural products and synthetic compounds were also explored. Furthermore, eight key descriptors and six structural alerts associated with DILI were analyzed to increase the interpretability of the models. The improved performance of HR-DILI indicated that it would provide reliable guidance for DILI risk assessment.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9849638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring activity landscapes with extended similarity: is Tanimoto enough? 探索具有扩展相似性的活动景观:谷本是否足够?
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2023-07-01 DOI: 10.1002/minf.202300056
Timothy B Dunn, Edgar López-López, Taewon David Kim, José L Medina-Franco, Ramón Alain Miranda-Quintana

Understanding structure-activity landscapes is essential in drug discovery. Similarly, it has been shown that the presence of activity cliffs in compound data sets can have a substantial impact not only on the design progress but also can influence the predictive ability of machine learning models. With the continued expansion of the chemical space and the currently available large and ultra-large libraries, it is imperative to implement efficient tools to analyze the activity landscape of compound data sets rapidly. The goal of this study is to show the applicability of the n-ary indices to quantify the structure-activity landscapes of large compound data sets using different types of structural representation rapidly and efficiently. We also discuss how a recently introduced medoid algorithm provides the foundation to finding optimum correlations between similarity measures and structure-activity rankings. The applicability of the n-ary indices and the medoid algorithm is shown by analyzing the activity landscape of 10 compound data sets with pharmaceutical relevance using three fingerprints of different designs, 16 extended similarity indices, and 11 coincidence thresholds.

了解结构-活性格局对药物发现至关重要。同样,研究表明,复合数据集中活动悬崖的存在不仅会对设计进度产生重大影响,还会影响机器学习模型的预测能力。随着化学空间的不断扩大和目前可用的大型和超大型库,实现高效的工具来快速分析化合物数据集的活动景观是势在必行的。本研究的目的是展示n元指数在使用不同类型的结构表示快速有效地量化大型复合数据集的结构-活动景观方面的适用性。我们还讨论了最近引入的媒质算法如何为寻找相似性度量和结构-活性排名之间的最佳相关性提供基础。通过使用3种不同设计的指纹图谱、16个扩展相似度指标和11个符合阈值对10个具有药物相关性的化合物数据集的活性景观进行分析,验证了n-ary指标和中间算法的适用性。
{"title":"Exploring activity landscapes with extended similarity: is Tanimoto enough?","authors":"Timothy B Dunn,&nbsp;Edgar López-López,&nbsp;Taewon David Kim,&nbsp;José L Medina-Franco,&nbsp;Ramón Alain Miranda-Quintana","doi":"10.1002/minf.202300056","DOIUrl":"https://doi.org/10.1002/minf.202300056","url":null,"abstract":"<p><p>Understanding structure-activity landscapes is essential in drug discovery. Similarly, it has been shown that the presence of activity cliffs in compound data sets can have a substantial impact not only on the design progress but also can influence the predictive ability of machine learning models. With the continued expansion of the chemical space and the currently available large and ultra-large libraries, it is imperative to implement efficient tools to analyze the activity landscape of compound data sets rapidly. The goal of this study is to show the applicability of the n-ary indices to quantify the structure-activity landscapes of large compound data sets using different types of structural representation rapidly and efficiently. We also discuss how a recently introduced medoid algorithm provides the foundation to finding optimum correlations between similarity measures and structure-activity rankings. The applicability of the n-ary indices and the medoid algorithm is shown by analyzing the activity landscape of 10 compound data sets with pharmaceutical relevance using three fingerprints of different designs, 16 extended similarity indices, and 11 coincidence thresholds.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9794062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Identification of a PD1/PD-L1 inhibitor by structure-based pharmacophore modelling, virtual screening, molecular docking and biological evaluation. 基于结构的药效团建模、虚拟筛选、分子对接和生物学评价鉴定PD1/PD-L1抑制剂。
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2023-06-01 DOI: 10.1002/minf.202200254
Gopi Mohan C, Anju Pushkaran, Kumaran K, Ann MariaT, Raja Biswas

PD-1/PD-L1 is a critical druggable target for immunotherapy against sepsis. Chemoinformatics techniques involved the structure-based 3D pharmacophore model development followed by virtual screening of small molecule databases to identify the small molecules against PD-L1 pathway inhibition. Raltitrexed and Safinamide act as potent repurposed drugs, and three other Specs database compounds using in silico methods. These compounds were screened based on the pharmacophore fit score and binding affinity towards the active site of the PD-L1 protein. In silico pharmacokinetic profiling of these screened compounds was done to test their biological activity. Next, experimental validation of the best four virtually screened hits was done in vitro for its hemocompatibility and cytotoxicity. Among these, Raltitrexed, Safinamide and Specs compound (AK-968/40642641) effectively increased the proliferation of immune cells and IFN-γ production. These compounds can act as potent PDL-1 inhibitors for adjuvant therapy against sepsis.

PD-1/PD-L1是免疫治疗败血症的关键药物靶点。化学信息学技术包括基于结构的3D药效团模型开发,然后对小分子数据库进行虚拟筛选,以确定抗PD-L1途径抑制的小分子。雷替曲塞和沙非胺作为有效的再用途药物,以及其他三种Specs数据库化合物使用计算机方法。这些化合物是根据药效团匹配评分和与PD-L1蛋白活性位点的结合亲和力筛选的。对这些筛选的化合物进行了计算机药代动力学分析,以测试其生物活性。接下来,实验验证了最佳的四个虚拟筛选命中进行了体外血液相容性和细胞毒性。其中,雷替曲塞、沙芬酰胺和Specs化合物(AK-968/40642641)有效地增加了免疫细胞的增殖和IFN-γ的产生。这些化合物可以作为有效的PDL-1抑制剂用于败血症的辅助治疗。
{"title":"Identification of a PD1/PD-L1 inhibitor by structure-based pharmacophore modelling, virtual screening, molecular docking and biological evaluation.","authors":"Gopi Mohan C,&nbsp;Anju Pushkaran,&nbsp;Kumaran K,&nbsp;Ann MariaT,&nbsp;Raja Biswas","doi":"10.1002/minf.202200254","DOIUrl":"https://doi.org/10.1002/minf.202200254","url":null,"abstract":"<p><p>PD-1/PD-L1 is a critical druggable target for immunotherapy against sepsis. Chemoinformatics techniques involved the structure-based 3D pharmacophore model development followed by virtual screening of small molecule databases to identify the small molecules against PD-L1 pathway inhibition. Raltitrexed and Safinamide act as potent repurposed drugs, and three other Specs database compounds using in silico methods. These compounds were screened based on the pharmacophore fit score and binding affinity towards the active site of the PD-L1 protein. In silico pharmacokinetic profiling of these screened compounds was done to test their biological activity. Next, experimental validation of the best four virtually screened hits was done in vitro for its hemocompatibility and cytotoxicity. Among these, Raltitrexed, Safinamide and Specs compound (AK-968/40642641) effectively increased the proliferation of immune cells and IFN-γ production. These compounds can act as potent PDL-1 inhibitors for adjuvant therapy against sepsis.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9680278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Compression of molecular fingerprints with autoencoder networks. 用自编码器网络压缩分子指纹。
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2023-06-01 DOI: 10.1002/minf.202300059
Gisbert Schneider, Agnieszka Ilnicka

Several binary molecular fingerprints were compressed using an autoencoder neural network. We analyzed the impact of compression on fingerprint performance in downstream classification and regression tasks. Classifiers trained on compressed fingerprints were negligibly affected. Regression models benefitted from compression, especially of long fingerprints (Morgan, RDK). However, their performance dropped rapidly for compression levels exceeding 90 %. Property co-learning positively influenced the predictive power of the compressed fingerprints, with a mean score improvement up to 20 %, suggesting that autoencoder compression with property co-learning biases the molecular representation toward the predicted target, facilitating downstream training.

采用自编码器神经网络对多个二元分子指纹进行压缩。我们分析了压缩对下游分类和回归任务中指纹性能的影响。在压缩指纹上训练的分类器受影响很小。回归模型受益于压缩,特别是长指纹(Morgan, RDK)。然而,当压缩水平超过90%时,它们的性能迅速下降。属性共同学习对压缩指纹的预测能力有正向影响,平均得分提高了20%,这表明带有属性共同学习的自编码器压缩使分子表征偏向于预测目标,有利于下游训练。
{"title":"Compression of molecular fingerprints with autoencoder networks.","authors":"Gisbert Schneider,&nbsp;Agnieszka Ilnicka","doi":"10.1002/minf.202300059","DOIUrl":"https://doi.org/10.1002/minf.202300059","url":null,"abstract":"<p><p>Several binary molecular fingerprints were compressed using an autoencoder neural network. We analyzed the impact of compression on fingerprint performance in downstream classification and regression tasks. Classifiers trained on compressed fingerprints were negligibly affected. Regression models benefitted from compression, especially of long fingerprints (Morgan, RDK). However, their performance dropped rapidly for compression levels exceeding 90 %. Property co-learning positively influenced the predictive power of the compressed fingerprints, with a mean score improvement up to 20 %, suggesting that autoencoder compression with property co-learning biases the molecular representation toward the predicted target, facilitating downstream training.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9681391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Overproduce and select, or determine optimal molecular descriptor subset via configuration space optimization? Application to the prediction of ecotoxicological endpoints. 过度生产和选择,还是通过构型空间优化确定最佳分子描述子子集?生态毒理学终点预测的应用。
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2023-06-01 DOI: 10.1002/minf.202200227
Luis A García-González, Yovani Marrero-Ponce, Carlos A Brizuela, César R García-Jacas

Predicting the likely biological activity (or property) of compounds is a fundamental and challenging task in the drug discovery process. Current computational methodologies aim to improve their predictive accuracies by using deep learning (DL) approaches. However, non-DL based approaches for small- and medium-sized chemical datasets have demonstrated to be most suitable for. In this approach, an initial universe of molecular descriptors (MDs) is first calculated, then different feature selection algorithms are applied, and finally, one or several predictive models are built. Herein we demonstrate that this traditional approach may miss relevant information by assuming that the initial universe of MDs codifies all relevant aspects for the respective learning task. We argue that this limitation is mainly because of the constrained intervals of the parameters used in the algorithms that compute MDs, parameters that define the Descriptor Configuration Space (DCS). We propose to relax these constraints in an open CDS approach, so that a larger universe of MDs can be initially considered. We model the generation of MDs as a multicriteria optimization problem and tackle it with a variant of the standard genetic algorithm. As a novel component, the fitness function is computed by aggregating four criteria via the Choquet integral. Experimental results show that the proposed approach generates a meaningful DCS by improving state-of-the-art approaches in most of the benchmarking chemical datasets accounted for.

在药物发现过程中,预测化合物可能的生物活性(或性质)是一项基本且具有挑战性的任务。当前的计算方法旨在通过使用深度学习(DL)方法来提高其预测准确性。然而,非基于深度学习的方法用于中小型化学数据集已被证明是最适合的。该方法首先计算分子描述符的初始域,然后应用不同的特征选择算法,最后建立一个或多个预测模型。在这里,我们证明了这种传统方法可能会遗漏相关信息,因为它假设MDs的初始范围包含了各自学习任务的所有相关方面。我们认为这种限制主要是因为在计算MDs的算法中使用的参数的约束区间,这些参数定义了描述符配置空间(DCS)。我们建议在开放CDS方法中放宽这些限制,以便最初可以考虑更大的MDs范围。我们将MDs的生成建模为一个多准则优化问题,并使用标准遗传算法的变体来解决它。适应度函数作为一种新的分量,通过Choquet积分对四个准则进行聚合计算。实验结果表明,所提出的方法通过改进大多数基准化学数据集中最先进的方法产生了有意义的DCS。
{"title":"Overproduce and select, or determine optimal molecular descriptor subset via configuration space optimization? Application to the prediction of ecotoxicological endpoints.","authors":"Luis A García-González,&nbsp;Yovani Marrero-Ponce,&nbsp;Carlos A Brizuela,&nbsp;César R García-Jacas","doi":"10.1002/minf.202200227","DOIUrl":"https://doi.org/10.1002/minf.202200227","url":null,"abstract":"<p><p>Predicting the likely biological activity (or property) of compounds is a fundamental and challenging task in the drug discovery process. Current computational methodologies aim to improve their predictive accuracies by using deep learning (DL) approaches. However, non-DL based approaches for small- and medium-sized chemical datasets have demonstrated to be most suitable for. In this approach, an initial universe of molecular descriptors (MDs) is first calculated, then different feature selection algorithms are applied, and finally, one or several predictive models are built. Herein we demonstrate that this traditional approach may miss relevant information by assuming that the initial universe of MDs codifies all relevant aspects for the respective learning task. We argue that this limitation is mainly because of the constrained intervals of the parameters used in the algorithms that compute MDs, parameters that define the Descriptor Configuration Space (DCS). We propose to relax these constraints in an open CDS approach, so that a larger universe of MDs can be initially considered. We model the generation of MDs as a multicriteria optimization problem and tackle it with a variant of the standard genetic algorithm. As a novel component, the fitness function is computed by aggregating four criteria via the Choquet integral. Experimental results show that the proposed approach generates a meaningful DCS by improving state-of-the-art approaches in most of the benchmarking chemical datasets accounted for.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9682498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Gas-to-ionic liquid partition: QSPR modeling and mechanistic interpretation. 气体-离子液体分配:QSPR模型和机理解释。
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2023-06-01 DOI: 10.1002/minf.202200223
Jia-Xi Chang, Jian-Wei Zou, Chao-Yuan Lou, Jia-Xin Ye, Rui Feng, Zi-Yuan Li, Gui-Xiang Hu

The present work was devoted to explore the quantitative structure-property relationships for gas-to-ionic liquid partition coefficients (log KILA ). A series of linear models were first established for the representative dataset (IL01). The optimal model was a four-parameter equation (1Ed) consisting of two electrostatic potential-based descriptors ( Σ V s , i n d - ${{rm { Sigma }}{V}_{s,ind}^{-}}$ and Vs,max ), one 2D matrix-based descriptor (J_D/Dt) and dipole moment (μ). All of the four descriptors introduced in the model can find the corresponding parameters, directly or indirectly, from Abraham's linear solvation energy relationship (LSER) or its theoretical alternatives, which endows the model good interpretability. Gaussian process was utilized to build the nonlinear model. Systematical validations, including 5-fold cross-validation for the training set, the validation for test set, as well as a more rigorous Monte Carlo cross-validation were performed to verify the reliability of the constructed models. Applicability domain of the model was evaluated, and the Williams plot revealed that the model can be used to predict the log KILA values of structurally diverse solutes. The other 13 datasets were also processed in the same way, and all of the linear models with expressions similar to equation 1Ed were obtained. These models, whether linear of nonlinear, represent satisfactory statistical results, which confirms the universality of the method adopted in this study in QSPR modeling of gas-to-IL partition.

本工作致力于探索气体-离子液体分配系数(log KILA)的定量结构-性质关系。首先针对代表性数据集(IL01)建立了一系列线性模型。最优模型是由两个基于静电电位的描述子(Σ Vs, ind - ${{rm { Sigma}}{V}_{s,ind}^{-}}$和V,max)、一个基于二维矩阵的描述子(J_D/Dt)和偶极矩(μ)组成的四参数方程(1Ed)。模型中引入的四种描述符都可以直接或间接地从亚伯拉罕的线性溶剂化能关系(LSER)或其理论替代中找到相应的参数,这赋予了模型良好的可解释性。采用高斯过程建立非线性模型。系统验证,包括对训练集的5倍交叉验证,对测试集的验证,以及更严格的蒙特卡罗交叉验证,以验证所构建模型的可靠性。对模型的适用范围进行了评估,Williams图显示该模型可用于预测结构不同的溶质的对数KILA值。对其余13个数据集也进行同样的处理,得到的线性模型均与方程1Ed相似。这些模型,无论是线性的还是非线性的,都代表了令人满意的统计结果,这证实了本研究采用的方法在气-油划分QSPR建模中的通用性。
{"title":"Gas-to-ionic liquid partition: QSPR modeling and mechanistic interpretation.","authors":"Jia-Xi Chang,&nbsp;Jian-Wei Zou,&nbsp;Chao-Yuan Lou,&nbsp;Jia-Xin Ye,&nbsp;Rui Feng,&nbsp;Zi-Yuan Li,&nbsp;Gui-Xiang Hu","doi":"10.1002/minf.202200223","DOIUrl":"https://doi.org/10.1002/minf.202200223","url":null,"abstract":"<p><p>The present work was devoted to explore the quantitative structure-property relationships for gas-to-ionic liquid partition coefficients (log K<sub>ILA</sub> ). A series of linear models were first established for the representative dataset (IL01). The optimal model was a four-parameter equation (1Ed) consisting of two electrostatic potential-based descriptors ( <math> <semantics><mrow><mi>Σ</mi> <msubsup><mi>V</mi> <mrow><mi>s</mi> <mo>,</mo> <mi>i</mi> <mi>n</mi> <mi>d</mi></mrow> <mo>-</mo></msubsup> </mrow> <annotation>${{rm { Sigma }}{V}_{s,ind}^{-}}$</annotation> </semantics> </math> and V<sub>s,max</sub> ), one 2D matrix-based descriptor (J_D/Dt) and dipole moment (μ). All of the four descriptors introduced in the model can find the corresponding parameters, directly or indirectly, from Abraham's linear solvation energy relationship (LSER) or its theoretical alternatives, which endows the model good interpretability. Gaussian process was utilized to build the nonlinear model. Systematical validations, including 5-fold cross-validation for the training set, the validation for test set, as well as a more rigorous Monte Carlo cross-validation were performed to verify the reliability of the constructed models. Applicability domain of the model was evaluated, and the Williams plot revealed that the model can be used to predict the log K<sub>ILA</sub> values of structurally diverse solutes. The other 13 datasets were also processed in the same way, and all of the linear models with expressions similar to equation 1Ed were obtained. These models, whether linear of nonlinear, represent satisfactory statistical results, which confirms the universality of the method adopted in this study in QSPR modeling of gas-to-IL partition.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10056650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Augmenting bioactivity by docking-generated multiple ligand poses to enhance machine learning and pharmacophore modelling: discovery of new TTK inhibitors as case study. 通过对接产生的多个配体姿势来增强生物活性,以增强机器学习和药效团建模:发现新的TTK抑制剂作为案例研究。
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2023-06-01 DOI: 10.1002/minf.202300022
Amenah M Al-Imam, Safa Daoud, Ma'mon M Hatmal, Mutasem Omar Taha

Dual specificity protein kinase threonine/Tyrosine kinase (TTK) is one of the mitotic kinases. High levels of TTK are detected in several types of cancer. Hence, TTK inhibition is considered a promising therapeutic anti-cancer strategy. In this work, we used multiple docked poses of TTK inhibitors to augment training data for machine learning QSAR modeling. Ligand-Receptor Contacts Fingerprints and docking scoring values were used as descriptor variables. Escalating docking-scoring consensus levels were scanned against orthogonal machine learners, and the best learners (Random Forests and XGBoost) were coupled with genetic algorithm and Shapley additive explanations (SHAP) to determine critical descriptors for predicting anti-TTK bioactivity and for pharmacophore generation. Three successful pharmacophores were deduced and subsequently used for in silico screening against the NCI database. A total of 14 hits were evaluated in vitro for their anti-TTK bioactivities. One hit of novel chemotype showed reasonable dose-response curve with experimental IC50 of 1.0 μM. The presented work indicates the validity of data augmentation using multiple docked poses for building successful machine learning models and pharmacophore hypotheses.

双特异性蛋白激酶苏氨酸/酪氨酸激酶(TTK)是一种有丝分裂激酶。在几种类型的癌症中检测到高水平的TTK。因此,TTK抑制被认为是一种很有前景的抗癌治疗策略。在这项工作中,我们使用TTK抑制剂的多个停靠姿势来增强机器学习QSAR建模的训练数据。配体-受体接触指纹和对接评分值作为描述变量。对正交机器学习器扫描逐步升级的对接评分共识水平,并将最佳学习器(随机森林和XGBoost)与遗传算法和Shapley加性解释(SHAP)相结合,以确定预测抗ttk生物活性和药效团生成的关键描述符。推断出三个成功的药效团,并随后用于针对NCI数据库的计算机筛选。共对14个hit进行了体外抗ttk生物活性评价。1次新化学型具有合理的剂量-反应曲线,实验IC50为1.0 μM。所提出的工作表明,使用多个停靠姿势进行数据增强对于构建成功的机器学习模型和药效团假设是有效的。
{"title":"Augmenting bioactivity by docking-generated multiple ligand poses to enhance machine learning and pharmacophore modelling: discovery of new TTK inhibitors as case study.","authors":"Amenah M Al-Imam,&nbsp;Safa Daoud,&nbsp;Ma'mon M Hatmal,&nbsp;Mutasem Omar Taha","doi":"10.1002/minf.202300022","DOIUrl":"https://doi.org/10.1002/minf.202300022","url":null,"abstract":"<p><p>Dual specificity protein kinase threonine/Tyrosine kinase (TTK) is one of the mitotic kinases. High levels of TTK are detected in several types of cancer. Hence, TTK inhibition is considered a promising therapeutic anti-cancer strategy. In this work, we used multiple docked poses of TTK inhibitors to augment training data for machine learning QSAR modeling. Ligand-Receptor Contacts Fingerprints and docking scoring values were used as descriptor variables. Escalating docking-scoring consensus levels were scanned against orthogonal machine learners, and the best learners (Random Forests and XGBoost) were coupled with genetic algorithm and Shapley additive explanations (SHAP) to determine critical descriptors for predicting anti-TTK bioactivity and for pharmacophore generation. Three successful pharmacophores were deduced and subsequently used for in silico screening against the NCI database. A total of 14 hits were evaluated in vitro for their anti-TTK bioactivities. One hit of novel chemotype showed reasonable dose-response curve with experimental IC<sub>50</sub> of 1.0 μM. The presented work indicates the validity of data augmentation using multiple docked poses for building successful machine learning models and pharmacophore hypotheses.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9675061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Co-model for chemical toxicity prediction based on multi-task deep learning. 基于多任务深度学习的化学毒性预测协同模型。
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2023-05-01 DOI: 10.1002/minf.202200257
Yuan Yuan Li, Lingfeng Chen, Chengtao Pu, Chengdong Zang, YingChao Yan, Yadong Chen, Yanmin Zhang, Haichun Liu

The toxicity of compounds is closely related to the effectiveness and safety of drug development, and accurately predicting the toxicity of compounds is one of the most challenging tasks in medicinal chemistry and pharmacology. In this paper, we construct three types of models for single and multi-tasking based on 2D and 3D descriptors, fingerprints and molecular graphs, and then validate the models with benchmark tests on the Tox21 data challenge. We found that due to the information sharing mechanism of multi-task learning, it could address the imbalance problem of the Tox21 data sets to some extent, and the prediction performance of the multi-task was significantly improved compared with the single task in general. Given the complement of the different molecular representations and modeling algorithms, we attempted to integrate them into a robust Co-Model. Our Co-Model performs well in various evaluation metrics on the test set and also achieves significant performance improvement compared to other models in the literature, which clearly demonstrates its superior predictive power and robustness.

化合物的毒性与药物开发的有效性和安全性密切相关,准确预测化合物的毒性是药物化学和药理学领域最具挑战性的任务之一。本文基于二维和三维描述符、指纹图谱和分子图谱构建了单任务和多任务三种模型,并在Tox21数据挑战上进行了基准测试验证。我们发现,由于多任务学习的信息共享机制,可以在一定程度上解决Tox21数据集的不平衡问题,并且多任务的预测性能相对于一般的单任务有显著提高。考虑到不同分子表示和建模算法的互补,我们试图将它们集成到一个鲁棒的Co-Model中。我们的Co-Model在测试集的各种评价指标上表现良好,与文献中其他模型相比,也取得了显著的性能提升,这清楚地表明了其优越的预测能力和鲁棒性。
{"title":"Co-model for chemical toxicity prediction based on multi-task deep learning.","authors":"Yuan Yuan Li,&nbsp;Lingfeng Chen,&nbsp;Chengtao Pu,&nbsp;Chengdong Zang,&nbsp;YingChao Yan,&nbsp;Yadong Chen,&nbsp;Yanmin Zhang,&nbsp;Haichun Liu","doi":"10.1002/minf.202200257","DOIUrl":"https://doi.org/10.1002/minf.202200257","url":null,"abstract":"<p><p>The toxicity of compounds is closely related to the effectiveness and safety of drug development, and accurately predicting the toxicity of compounds is one of the most challenging tasks in medicinal chemistry and pharmacology. In this paper, we construct three types of models for single and multi-tasking based on 2D and 3D descriptors, fingerprints and molecular graphs, and then validate the models with benchmark tests on the Tox21 data challenge. We found that due to the information sharing mechanism of multi-task learning, it could address the imbalance problem of the Tox21 data sets to some extent, and the prediction performance of the multi-task was significantly improved compared with the single task in general. Given the complement of the different molecular representations and modeling algorithms, we attempted to integrate them into a robust Co-Model. Our Co-Model performs well in various evaluation metrics on the test set and also achieves significant performance improvement compared to other models in the literature, which clearly demonstrates its superior predictive power and robustness.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9510308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Molecular Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1