{"title":"Quantitative Structure-Activity Relationships of Estrogen Receptor Alpha Based on Molecular Descriptors Selection and Extreme Gradient Boosting","authors":"Shaotong Liu, Zhewei Xu, Dongsheng Ye","doi":"10.1145/3583788.3583807","DOIUrl":null,"url":null,"abstract":"Quantitative Structure-Activity Relationships (QSAR), which aims to estimate the estrogen receptor alpha (ERα) activity of compounds through their chemical features and ERα, is a fundamental part in the process of drug discovery for breast cancer treatment. Due to the variety of data properties, the building of a suitable QSAR model is a challenging task. Meanwhile, the challenge of QSAR lies in the complexity of compound molecular descriptors which make it difficult to screen robust molecular descriptors. Previous studies select molecular descriptors manually based on expert knowledge and experience. However, they are highly subjective which could lead to ineffectiveness of molecular descriptors. In this paper, a novel approach is presented to address the problems in the context of regression modelling and feature selection. Firstly, two filtered and two embedded scoring metrics are proposed to jointly sort and select the most relevant and robust molecular descriptors. Then the selected features are used to build the supervised data-driven model, namely eXtreme Gradient Boosting (XGBoost) algorithm. Experimental results show that our selected molecular descriptors can give good predictions to the target ERα bioactivity and our regression approach outperform formal models.","PeriodicalId":292167,"journal":{"name":"Proceedings of the 2023 7th International Conference on Machine Learning and Soft Computing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 7th International Conference on Machine Learning and Soft Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3583788.3583807","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Quantitative Structure-Activity Relationships (QSAR), which aims to estimate the estrogen receptor alpha (ERα) activity of compounds through their chemical features and ERα, is a fundamental part in the process of drug discovery for breast cancer treatment. Due to the variety of data properties, the building of a suitable QSAR model is a challenging task. Meanwhile, the challenge of QSAR lies in the complexity of compound molecular descriptors which make it difficult to screen robust molecular descriptors. Previous studies select molecular descriptors manually based on expert knowledge and experience. However, they are highly subjective which could lead to ineffectiveness of molecular descriptors. In this paper, a novel approach is presented to address the problems in the context of regression modelling and feature selection. Firstly, two filtered and two embedded scoring metrics are proposed to jointly sort and select the most relevant and robust molecular descriptors. Then the selected features are used to build the supervised data-driven model, namely eXtreme Gradient Boosting (XGBoost) algorithm. Experimental results show that our selected molecular descriptors can give good predictions to the target ERα bioactivity and our regression approach outperform formal models.