{"title":"基于分子描述子选择和极端梯度增强的雌激素受体α定量构效关系研究","authors":"Shaotong Liu, Zhewei Xu, Dongsheng Ye","doi":"10.1145/3583788.3583807","DOIUrl":null,"url":null,"abstract":"Quantitative Structure-Activity Relationships (QSAR), which aims to estimate the estrogen receptor alpha (ERα) activity of compounds through their chemical features and ERα, is a fundamental part in the process of drug discovery for breast cancer treatment. Due to the variety of data properties, the building of a suitable QSAR model is a challenging task. Meanwhile, the challenge of QSAR lies in the complexity of compound molecular descriptors which make it difficult to screen robust molecular descriptors. Previous studies select molecular descriptors manually based on expert knowledge and experience. However, they are highly subjective which could lead to ineffectiveness of molecular descriptors. In this paper, a novel approach is presented to address the problems in the context of regression modelling and feature selection. Firstly, two filtered and two embedded scoring metrics are proposed to jointly sort and select the most relevant and robust molecular descriptors. Then the selected features are used to build the supervised data-driven model, namely eXtreme Gradient Boosting (XGBoost) algorithm. Experimental results show that our selected molecular descriptors can give good predictions to the target ERα bioactivity and our regression approach outperform formal models.","PeriodicalId":292167,"journal":{"name":"Proceedings of the 2023 7th International Conference on Machine Learning and Soft Computing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Quantitative Structure-Activity Relationships of Estrogen Receptor Alpha Based on Molecular Descriptors Selection and Extreme Gradient Boosting\",\"authors\":\"Shaotong Liu, Zhewei Xu, Dongsheng Ye\",\"doi\":\"10.1145/3583788.3583807\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Quantitative Structure-Activity Relationships (QSAR), which aims to estimate the estrogen receptor alpha (ERα) activity of compounds through their chemical features and ERα, is a fundamental part in the process of drug discovery for breast cancer treatment. Due to the variety of data properties, the building of a suitable QSAR model is a challenging task. Meanwhile, the challenge of QSAR lies in the complexity of compound molecular descriptors which make it difficult to screen robust molecular descriptors. Previous studies select molecular descriptors manually based on expert knowledge and experience. However, they are highly subjective which could lead to ineffectiveness of molecular descriptors. In this paper, a novel approach is presented to address the problems in the context of regression modelling and feature selection. Firstly, two filtered and two embedded scoring metrics are proposed to jointly sort and select the most relevant and robust molecular descriptors. Then the selected features are used to build the supervised data-driven model, namely eXtreme Gradient Boosting (XGBoost) algorithm. Experimental results show that our selected molecular descriptors can give good predictions to the target ERα bioactivity and our regression approach outperform formal models.\",\"PeriodicalId\":292167,\"journal\":{\"name\":\"Proceedings of the 2023 7th International Conference on Machine Learning and Soft Computing\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 7th International Conference on Machine Learning and Soft Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3583788.3583807\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 7th International Conference on Machine Learning and Soft Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3583788.3583807","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Quantitative Structure-Activity Relationships of Estrogen Receptor Alpha Based on Molecular Descriptors Selection and Extreme Gradient Boosting
Quantitative Structure-Activity Relationships (QSAR), which aims to estimate the estrogen receptor alpha (ERα) activity of compounds through their chemical features and ERα, is a fundamental part in the process of drug discovery for breast cancer treatment. Due to the variety of data properties, the building of a suitable QSAR model is a challenging task. Meanwhile, the challenge of QSAR lies in the complexity of compound molecular descriptors which make it difficult to screen robust molecular descriptors. Previous studies select molecular descriptors manually based on expert knowledge and experience. However, they are highly subjective which could lead to ineffectiveness of molecular descriptors. In this paper, a novel approach is presented to address the problems in the context of regression modelling and feature selection. Firstly, two filtered and two embedded scoring metrics are proposed to jointly sort and select the most relevant and robust molecular descriptors. Then the selected features are used to build the supervised data-driven model, namely eXtreme Gradient Boosting (XGBoost) algorithm. Experimental results show that our selected molecular descriptors can give good predictions to the target ERα bioactivity and our regression approach outperform formal models.