{"title":"Building a ML-based QSAR model for predicting the bioactivity of therapeutically active drug class with imidazole scaffold","authors":"Komal Singh , Irina Ghosh , Venkatesan Jayaprakash , Sudeepan Jayapalan","doi":"10.1016/j.ejmcr.2024.100148","DOIUrl":null,"url":null,"abstract":"<div><p>Human immunodeficiency virus, a retrovirus, causes AIDS, a chronic immune system disease. HIV interferes with the ability of our body to combat disease and infection by weakening our immune system. An essential enzyme necessary for HIV replication is reverse transcriptase (RT). RT inhibitors (RTIs) are a class of antiretroviral drugs that target HIV's RT enzyme, blocking its ability to convert viral RNA into DNA. The RT-1 enzyme has been found to be inhibited by imidazole. It attaches to the RT-1 enzyme's active site and prevents it from performing its usual activity. As a result, viral replication is inhibited, which can eventually aid in slowing the course of HIV and other retroviral diseases. A computational tool allows researchers to simulate and analyze the drug's behaviour in a virtual environment, providing valuable insights into its pharmacological properties, efficacy, and safety. QSAR modelling uses machine learning methods to create predictive models from datasets of chemical substances and the accompanying biological activity. Here, a comparative analysis of the model performances by four different algorithms for the Imidazole scaffold are reported. The algorithms of Support Vector Regression (SVR), Random Forest Regression (RFR), Decision Tree Regression (DTR) and Hist Gradient Boosting Regression (HGBR) have given promising results with the R<sup>2</sup> value of 0.905, 0.993, 0.688 and 0.921 respectively for the train sets and for the test set 0.843, 0.977, 0.567 and 0.880. The best performed RFR model have been validated using developed RFR codes for randomly selected compounds and it shows the error percentage of about 0.151% only. From the R<sup>2</sup> values, it is observed that the RFR and HGBR models show a better fit with the variables compared to the other models thereby making them the potential models for predicting the activity of novel anti-viral compounds.</p></div>","PeriodicalId":12015,"journal":{"name":"European Journal of Medicinal Chemistry Reports","volume":"11 ","pages":"Article 100148"},"PeriodicalIF":0.0000,"publicationDate":"2024-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772417424000207/pdfft?md5=f8c0587cac96b9677a261126b3c259c5&pid=1-s2.0-S2772417424000207-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Medicinal Chemistry Reports","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772417424000207","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Human immunodeficiency virus, a retrovirus, causes AIDS, a chronic immune system disease. HIV interferes with the ability of our body to combat disease and infection by weakening our immune system. An essential enzyme necessary for HIV replication is reverse transcriptase (RT). RT inhibitors (RTIs) are a class of antiretroviral drugs that target HIV's RT enzyme, blocking its ability to convert viral RNA into DNA. The RT-1 enzyme has been found to be inhibited by imidazole. It attaches to the RT-1 enzyme's active site and prevents it from performing its usual activity. As a result, viral replication is inhibited, which can eventually aid in slowing the course of HIV and other retroviral diseases. A computational tool allows researchers to simulate and analyze the drug's behaviour in a virtual environment, providing valuable insights into its pharmacological properties, efficacy, and safety. QSAR modelling uses machine learning methods to create predictive models from datasets of chemical substances and the accompanying biological activity. Here, a comparative analysis of the model performances by four different algorithms for the Imidazole scaffold are reported. The algorithms of Support Vector Regression (SVR), Random Forest Regression (RFR), Decision Tree Regression (DTR) and Hist Gradient Boosting Regression (HGBR) have given promising results with the R2 value of 0.905, 0.993, 0.688 and 0.921 respectively for the train sets and for the test set 0.843, 0.977, 0.567 and 0.880. The best performed RFR model have been validated using developed RFR codes for randomly selected compounds and it shows the error percentage of about 0.151% only. From the R2 values, it is observed that the RFR and HGBR models show a better fit with the variables compared to the other models thereby making them the potential models for predicting the activity of novel anti-viral compounds.