MEF-AlloSite: an accurate and robust Multimodel Ensemble Feature selection for the Allosteric Site identification model

IF 5.7 2区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Journal of Cheminformatics Pub Date : 2024-10-23 DOI:10.1186/s13321-024-00882-5

Sadettin Y. Ugurlu, David McDonald, Shan He

{"title":"MEF-AlloSite: an accurate and robust Multimodel Ensemble Feature selection for the Allosteric Site identification model","authors":"Sadettin Y. Ugurlu, David McDonald, Shan He","doi":"10.1186/s13321-024-00882-5","DOIUrl":null,"url":null,"abstract":"<div>A crucial mechanism for controlling the actions of proteins is allostery. Allosteric modulators have the potential to provide many benefits compared to orthosteric ligands, such as increased selectivity and saturability of their effect. The identification of new allosteric sites presents prospects for the creation of innovative medications and enhances our comprehension of fundamental biological mechanisms. Allosteric sites are increasingly found in different protein families through various techniques, such as machine learning applications, which opens up possibilities for creating completely novel medications with a diverse variety of chemical structures. Machine learning methods, such as PASSer, exhibit limited efficacy in accurately finding allosteric binding sites when relying solely on 3D structural information.Scientific ContributionPrior to conducting feature selection for allosteric binding site identification, integration of supporting amino-acid–based information to 3D structural knowledge is advantageous. This approach can enhance performance by ensuring accuracy and robustness. Therefore, we have developed an accurate and robust model called Multimodel Ensemble Feature Selection for Allosteric Site Identification (MEF-AlloSite) after collecting 9460 relevant and diverse features from the literature to characterise pockets. The model employs an accurate and robust multimodal feature selection technique for the small training set size of only 90 proteins to improve predictive performance. This state-of-the-art technique increased the performance in allosteric binding site identification by selecting promising features from 9460 features. Also, the relationship between selected features and allosteric binding sites enlightened the understanding of complex allostery for proteins by analysing selected features. MEF-AlloSite and state-of-the-art allosteric site identification methods such as PASSer2.0 and PASSerRank have been tested on three test cases 51 times with a different split of the training set. The Student’s t test and Cohen’s D value have been used to evaluate the average precision and ROC AUC score distribution. On three test cases, most of the p-values (\$< 0.05\$) and the majority of Cohen’s D values (\$> 0.5\$) showed that MEF-AlloSite’s 1–6% higher mean of average precision and ROC AUC than state-of-the-art allosteric site identification methods are statistically significant.</div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":5.7000,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00882-5","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-024-00882-5","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

A crucial mechanism for controlling the actions of proteins is allostery. Allosteric modulators have the potential to provide many benefits compared to orthosteric ligands, such as increased selectivity and saturability of their effect. The identification of new allosteric sites presents prospects for the creation of innovative medications and enhances our comprehension of fundamental biological mechanisms. Allosteric sites are increasingly found in different protein families through various techniques, such as machine learning applications, which opens up possibilities for creating completely novel medications with a diverse variety of chemical structures. Machine learning methods, such as PASSer, exhibit limited efficacy in accurately finding allosteric binding sites when relying solely on 3D structural information.

Scientific Contribution

Prior to conducting feature selection for allosteric binding site identification, integration of supporting amino-acid–based information to 3D structural knowledge is advantageous. This approach can enhance performance by ensuring accuracy and robustness. Therefore, we have developed an accurate and robust model called Multimodel Ensemble Feature Selection for Allosteric Site Identification (MEF-AlloSite) after collecting 9460 relevant and diverse features from the literature to characterise pockets. The model employs an accurate and robust multimodal feature selection technique for the small training set size of only 90 proteins to improve predictive performance. This state-of-the-art technique increased the performance in allosteric binding site identification by selecting promising features from 9460 features. Also, the relationship between selected features and allosteric binding sites enlightened the understanding of complex allostery for proteins by analysing selected features. MEF-AlloSite and state-of-the-art allosteric site identification methods such as PASSer2.0 and PASSerRank have been tested on three test cases 51 times with a different split of the training set. The Student’s t test and Cohen’s D value have been used to evaluate the average precision and ROC AUC score distribution. On three test cases, most of the p-values ($< 0.05$) and the majority of Cohen’s D values ($> 0.5$) showed that MEF-AlloSite’s 1–6% higher mean of average precision and ROC AUC than state-of-the-art allosteric site identification methods are statistically significant.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MEF-AlloSite：针对异位基因位点识别模型的精确、稳健的多模型集合特征选择

控制蛋白质作用的一个重要机制是异构。与正表型配体相比，异位调节剂有可能带来许多好处，例如提高选择性和效应饱和度。鉴定新的异构位点为开发创新药物提供了前景，并加深了我们对基本生物机制的理解。通过机器学习应用等各种技术，我们在不同的蛋白质家族中发现了越来越多的异构位点，这为创造具有多种化学结构的全新药物提供了可能性。机器学习方法（如 PASSer）在仅依靠三维结构信息准确找到异构结合位点方面的功效有限。科学贡献在进行异生结合位点识别的特征选择之前，将基于氨基酸的支持信息与三维结构知识进行整合是非常有利的。这种方法可以确保准确性和稳健性，从而提高性能。因此，我们从文献中收集了9460个相关的不同特征来表征口袋，然后开发了一个准确而稳健的模型，称为 "用于异生结合位点识别的多模型集合特征选择（MEF-AlloSite）"。该模型针对仅有 90 个蛋白质的小型训练集，采用了精确、稳健的多模式特征选择技术，以提高预测性能。这种最先进的技术从 9460 个特征中筛选出了有希望的特征，从而提高了异生结合位点识别的性能。此外，通过分析所选特征与异构结合位点之间的关系，还有助于理解复杂的蛋白质异构。MEF-AlloSite 与 PASSer2.0 和 PASSerRank 等最先进的异构位点识别方法在三个测试用例上进行了 51 次测试，并对训练集进行了不同的拆分。采用学生 t 检验和 Cohen's D 值来评估平均精度和 ROC AUC 分数分布。在三个测试案例中，大多数 p 值（$$< 0.05$$）和大多数 Cohen's D 值（$$> 0.5$$）都表明，MEF-AlloSite 的平均精确度和 ROC AUC 平均值比最先进的异构位点识别方法高 1-6%，具有显著的统计学意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Cheminformatics CHEMISTRY, MULTIDISCIPLINARY-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

14.10

自引率

7.00%

发文量

审稿时长

3 months

期刊介绍： Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling. Coverage includes, but is not limited to: chemical information systems, software and databases, and molecular modelling, chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases, computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.