Jayaprakash Venkatesan, Thangavelu Saravanan, Karuppaiyan Ravindran, Thangavelu Prabha, Selvaraj Jubie, Jayapalan Sudeepan, M V N L Chaitanya, Thangavel Sivakumar
{"title":"Relevance of Machine Learning to Predict the Inhibitory Activity of Small Thiazole Chemicals on Estrogen Receptor.","authors":"Jayaprakash Venkatesan, Thangavelu Saravanan, Karuppaiyan Ravindran, Thangavelu Prabha, Selvaraj Jubie, Jayapalan Sudeepan, M V N L Chaitanya, Thangavel Sivakumar","doi":"10.2174/1573409919666221121141646","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Drug discovery requires the use of hybrid technologies for the discovery of new chemical substances. One of those interesting strategies is QSAR via applying an artificial intelligence system that effectively predicts how chemical alterations can impact biological activity via in-silico.</p><p><strong>Aim: </strong>Our present study aimed to work on a trending machine learning approach with a new opensource data analysis python script for the discovery of anticancer lead via building the QSAR model by using 53 compounds of thiazole derivatives.</p><p><strong>Methods: </strong>A python script has been executed with 53 small thiazole chemicals using Google collaboratory interface. A total of 82 CDK molecular descriptors were downloaded from \"chemdes\" web server and used for our study. After training the model, we checked the model performance via cross-validation of the external test set.</p><p><strong>Results: </strong>The generated QSAR model afforded the ordinary least squares (OLS) regression as R<sup>2</sup> = 0.542, F=8.773, and adjusted R<sup>2</sup> (Q2) =0.481, std. error = 0.061, reg.coef_ developed were of, - 0.00064 (PC1), -0.07753 (PC2), -0.09078 (PC3), -0.08986 (PC4), 0.05044 (PC5), and reg.intercept_ of 4.79279 developed through stats models, formula module. The performance of test set prediction was done by multiple linear regression, support vector machine, and partial least square regression classifiers of sklearn module, which generated the model score of 0.5424, 0.6422 and 0.6422 respectively.</p><p><strong>Conclusion: </strong>Hence, we conclude that the R2values (i.e. the model score) obtained using this script via three diverse algorithms were correlated well and there is not much difference between them and may be useful in the design of a similar group of thiazole derivatives as anticancer agents.</p>","PeriodicalId":10886,"journal":{"name":"Current computer-aided drug design","volume":"19 1","pages":"37-50"},"PeriodicalIF":1.5000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current computer-aided drug design","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2174/1573409919666221121141646","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Drug discovery requires the use of hybrid technologies for the discovery of new chemical substances. One of those interesting strategies is QSAR via applying an artificial intelligence system that effectively predicts how chemical alterations can impact biological activity via in-silico.
Aim: Our present study aimed to work on a trending machine learning approach with a new opensource data analysis python script for the discovery of anticancer lead via building the QSAR model by using 53 compounds of thiazole derivatives.
Methods: A python script has been executed with 53 small thiazole chemicals using Google collaboratory interface. A total of 82 CDK molecular descriptors were downloaded from "chemdes" web server and used for our study. After training the model, we checked the model performance via cross-validation of the external test set.
Results: The generated QSAR model afforded the ordinary least squares (OLS) regression as R2 = 0.542, F=8.773, and adjusted R2 (Q2) =0.481, std. error = 0.061, reg.coef_ developed were of, - 0.00064 (PC1), -0.07753 (PC2), -0.09078 (PC3), -0.08986 (PC4), 0.05044 (PC5), and reg.intercept_ of 4.79279 developed through stats models, formula module. The performance of test set prediction was done by multiple linear regression, support vector machine, and partial least square regression classifiers of sklearn module, which generated the model score of 0.5424, 0.6422 and 0.6422 respectively.
Conclusion: Hence, we conclude that the R2values (i.e. the model score) obtained using this script via three diverse algorithms were correlated well and there is not much difference between them and may be useful in the design of a similar group of thiazole derivatives as anticancer agents.
期刊介绍:
Aims & Scope
Current Computer-Aided Drug Design aims to publish all the latest developments in drug design based on computational techniques. The field of computer-aided drug design has had extensive impact in the area of drug design.
Current Computer-Aided Drug Design is an essential journal for all medicinal chemists who wish to be kept informed and up-to-date with all the latest and important developments in computer-aided methodologies and their applications in drug discovery. Each issue contains a series of timely, in-depth reviews, original research articles and letter articles written by leaders in the field, covering a range of computational techniques for drug design, screening, ADME studies, theoretical chemistry; computational chemistry; computer and molecular graphics; molecular modeling; protein engineering; drug design; expert systems; general structure-property relationships; molecular dynamics; chemical database development and usage etc., providing excellent rationales for drug development.