Classification of HIV-1 protease crystal structures using Random Forest, linear discriminant analysis and logistic regression

Gene M. Ko, A. Reddy, Sunil Kumar, S. A. Bailey, R. Garg
{"title":"Classification of HIV-1 protease crystal structures using Random Forest, linear discriminant analysis and logistic regression","authors":"Gene M. Ko, A. Reddy, Sunil Kumar, S. A. Bailey, R. Garg","doi":"10.1109/CIBCB.2010.5510465","DOIUrl":null,"url":null,"abstract":"The present study develops a classification model to correlate the binding pockets of 70 HIV-1 protease crystal structures in terms of their structural descriptors to their complexed HIV-1 protease inhibitors. The Random Forest classification model is used to reduce the chemical descriptor space from 456 to the 12 most relevant descriptors based on the Gini importance measure. The selected 12 descriptors are then used to develop classification models using linear discriminant analysis (LDA) and logistic regression (LR). The top eight descriptors were found to produce the best LDA model with an overall error of 30% and a leave-one-out cross validation error of 44.29%, while the top five descriptors were found to produce the best LR model with an overall error of 28.57% and a leave-one-out cross validation error of 41.43%. Hierarchical clustering was performed on the top five and eight descriptors to verify whether the descriptor selection of Random Forest can group together the binding pockets based on their complexed ligands. The selected descriptors would play a crucial role in understanding the HIV-1 protease binding pocket structure in terms of its chemical descriptors.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"22 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIBCB.2010.5510465","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The present study develops a classification model to correlate the binding pockets of 70 HIV-1 protease crystal structures in terms of their structural descriptors to their complexed HIV-1 protease inhibitors. The Random Forest classification model is used to reduce the chemical descriptor space from 456 to the 12 most relevant descriptors based on the Gini importance measure. The selected 12 descriptors are then used to develop classification models using linear discriminant analysis (LDA) and logistic regression (LR). The top eight descriptors were found to produce the best LDA model with an overall error of 30% and a leave-one-out cross validation error of 44.29%, while the top five descriptors were found to produce the best LR model with an overall error of 28.57% and a leave-one-out cross validation error of 41.43%. Hierarchical clustering was performed on the top five and eight descriptors to verify whether the descriptor selection of Random Forest can group together the binding pockets based on their complexed ligands. The selected descriptors would play a crucial role in understanding the HIV-1 protease binding pocket structure in terms of its chemical descriptors.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用随机森林、线性判别分析和逻辑回归对HIV-1蛋白酶晶体结构进行分类
本研究开发了一种分类模型,根据其结构描述符将70个HIV-1蛋白酶晶体结构的结合口袋与其复杂的HIV-1蛋白酶抑制剂联系起来。随机森林分类模型用于根据基尼重要性度量将化学描述符空间从456个减少到12个最相关的描述符。选择的12个描述符然后使用线性判别分析(LDA)和逻辑回归(LR)建立分类模型。结果表明,前8个描述符产生的最佳LDA模型总体误差为30%,留一交叉验证误差为44.29%;前5个描述符产生的最佳LR模型总体误差为28.57%,留一交叉验证误差为41.43%。对前5个和前8个描述符进行分层聚类,验证Random Forest的描述符选择是否能够根据结合口袋的配体对它们进行分组。所选择的描述符将在理解HIV-1蛋白酶结合袋结构的化学描述符方面发挥关键作用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Functional data classification for temporal gene expression data with kernel-induced random forests Detecting retroviruses using reading frame information and side effect machines Classification of HIV-1 protease crystal structures using Random Forest, linear discriminant analysis and logistic regression An exploration of individual RNA structural elements in RNA gene finding Support vectors based correlation coefficient for gene and sample selection in cancer classification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1