Quantitative Structure-Activity Relationships of Estrogen Receptor Alpha Based on Molecular Descriptors Selection and Extreme Gradient Boosting

Shaotong Liu, Zhewei Xu, Dongsheng Ye
{"title":"Quantitative Structure-Activity Relationships of Estrogen Receptor Alpha Based on Molecular Descriptors Selection and Extreme Gradient Boosting","authors":"Shaotong Liu, Zhewei Xu, Dongsheng Ye","doi":"10.1145/3583788.3583807","DOIUrl":null,"url":null,"abstract":"Quantitative Structure-Activity Relationships (QSAR), which aims to estimate the estrogen receptor alpha (ERα) activity of compounds through their chemical features and ERα, is a fundamental part in the process of drug discovery for breast cancer treatment. Due to the variety of data properties, the building of a suitable QSAR model is a challenging task. Meanwhile, the challenge of QSAR lies in the complexity of compound molecular descriptors which make it difficult to screen robust molecular descriptors. Previous studies select molecular descriptors manually based on expert knowledge and experience. However, they are highly subjective which could lead to ineffectiveness of molecular descriptors. In this paper, a novel approach is presented to address the problems in the context of regression modelling and feature selection. Firstly, two filtered and two embedded scoring metrics are proposed to jointly sort and select the most relevant and robust molecular descriptors. Then the selected features are used to build the supervised data-driven model, namely eXtreme Gradient Boosting (XGBoost) algorithm. Experimental results show that our selected molecular descriptors can give good predictions to the target ERα bioactivity and our regression approach outperform formal models.","PeriodicalId":292167,"journal":{"name":"Proceedings of the 2023 7th International Conference on Machine Learning and Soft Computing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 7th International Conference on Machine Learning and Soft Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3583788.3583807","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Quantitative Structure-Activity Relationships (QSAR), which aims to estimate the estrogen receptor alpha (ERα) activity of compounds through their chemical features and ERα, is a fundamental part in the process of drug discovery for breast cancer treatment. Due to the variety of data properties, the building of a suitable QSAR model is a challenging task. Meanwhile, the challenge of QSAR lies in the complexity of compound molecular descriptors which make it difficult to screen robust molecular descriptors. Previous studies select molecular descriptors manually based on expert knowledge and experience. However, they are highly subjective which could lead to ineffectiveness of molecular descriptors. In this paper, a novel approach is presented to address the problems in the context of regression modelling and feature selection. Firstly, two filtered and two embedded scoring metrics are proposed to jointly sort and select the most relevant and robust molecular descriptors. Then the selected features are used to build the supervised data-driven model, namely eXtreme Gradient Boosting (XGBoost) algorithm. Experimental results show that our selected molecular descriptors can give good predictions to the target ERα bioactivity and our regression approach outperform formal models.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于分子描述子选择和极端梯度增强的雌激素受体α定量构效关系研究
定量构效关系(Quantitative Structure-Activity Relationships, QSAR)是通过化合物的化学特征和雌激素受体α (estrogen receptor α, ERα)的活性来估计化合物的活性,是乳腺癌治疗药物发现过程中的基础环节。由于数据属性的多样性,构建合适的QSAR模型是一项具有挑战性的任务。同时,QSAR的挑战在于复合分子描述子的复杂性,使得难以筛选出具有鲁棒性的分子描述子。以往的研究都是基于专家知识和经验手动选择分子描述符。然而,它们是高度主观的,这可能导致分子描述符的无效。本文提出了一种新的方法来解决回归建模和特征选择中的问题。首先,提出了两个过滤评分指标和两个嵌入评分指标,共同排序和选择最相关和鲁棒性最强的分子描述子。然后将选择的特征用于建立监督数据驱动模型,即极限梯度增强算法(eXtreme Gradient boost, XGBoost)。实验结果表明,我们选择的分子描述符可以很好地预测目标ERα的生物活性,我们的回归方法优于形式模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Decision Model of Ship Intelligent Collision Avoidance Based on Automatic Information System Data and Generic Adversary Imitation Learning-Deep Deterministic Policy Gradient Joint Action Representation and Prioritized Experience Replay for Reinforcement Learning in Large Discrete Action Spaces Neural Network Optimization Objective Vector Representation based on Genetic Algorithm and Its Multi-objective Optimization Method Deep Learning-Enabled Prediction of Daily Solar Irradiance from Simulated Climate Data CascadeTransformer: Multi-label Classification with Transformer in Chronic Disease Prediction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1