vEXP: A virtual enhanced cross screen panel for off-target pharmacology alerts

IF 2.9 Q2 TOXICOLOGY Computational Toxicology Pub Date : 2024-09-01 Epub Date: 2024-07-22 DOI:10.1016/j.comtox.2024.100324

James A. Lumley , David Fallon , Ryan Whatling , Damien Coupry , Andrew Brown

{"title":"vEXP: A virtual enhanced cross screen panel for off-target pharmacology alerts","authors":"James A. Lumley , David Fallon , Ryan Whatling , Damien Coupry , Andrew Brown","doi":"10.1016/j.comtox.2024.100324","DOIUrl":null,"url":null,"abstract":"<div><p>We describe the development of the GSK vEXP (virtual enhanced cross screen panel) for off-target pharmacology alerts. The derivation of a panel of machine learning classification models or QSAR models (Quantitative Structure-Activity Relationship) for off-target safety assessment allows early alerting to risk factors in candidate drugs. The models are matched to an internal in-vitro biochemical screening panel described previously with some updates reported here. The extreme imbalance of some internal GSK datasets and most of the related external ChEMBL datasets is shown when considering potency thresholds relevant to in-vitro screening. The small size and bias to the active class make many ChEMBL datasets un-modellable using such thresholds. Although larger, many GSK datasets remain too imbalanced to give a performant model. The value of merging internal and external data to help rebalance datasets and improve the domain of applicability is demonstrated with improvements in model performance frequently seen on merged data. Efforts to collate public datasets with a far better balance of the missing in-actives would likely do more to improve opensource models than simply increasing dataset size. We investigate the use of moving the probability threshold and applying imbalanced learners to help overcome the imbalance problem. Both methods can produce models with improved performance when applied to imbalanced datasets. Datasets with class imbalance 95:5 % or with <100 compounds were un-modellable. Where datasets had a class imbalance of 90:10 % the imbalanced learn methods were often more performant than standard tree-based classifiers. No one classification algorithm consistently out-performed all others and our approach emphasises a standardised, automated build and evaluate approach across all classifiers to identify the best model. The application of vEXP includes ranking of hit compounds for fast prioritisation, flagging of hit series that contain systematic scaffold or functional group related risks and the confirmation that late-stage optimisation is not introducing new off-target activities in established chemical series.</p></div>","PeriodicalId":37651,"journal":{"name":"Computational Toxicology","volume":"31 ","pages":"Article 100324"},"PeriodicalIF":2.9000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Toxicology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468111324000264","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/22 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"TOXICOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

We describe the development of the GSK vEXP (virtual enhanced cross screen panel) for off-target pharmacology alerts. The derivation of a panel of machine learning classification models or QSAR models (Quantitative Structure-Activity Relationship) for off-target safety assessment allows early alerting to risk factors in candidate drugs. The models are matched to an internal in-vitro biochemical screening panel described previously with some updates reported here. The extreme imbalance of some internal GSK datasets and most of the related external ChEMBL datasets is shown when considering potency thresholds relevant to in-vitro screening. The small size and bias to the active class make many ChEMBL datasets un-modellable using such thresholds. Although larger, many GSK datasets remain too imbalanced to give a performant model. The value of merging internal and external data to help rebalance datasets and improve the domain of applicability is demonstrated with improvements in model performance frequently seen on merged data. Efforts to collate public datasets with a far better balance of the missing in-actives would likely do more to improve opensource models than simply increasing dataset size. We investigate the use of moving the probability threshold and applying imbalanced learners to help overcome the imbalance problem. Both methods can produce models with improved performance when applied to imbalanced datasets. Datasets with class imbalance 95:5 % or with <100 compounds were un-modellable. Where datasets had a class imbalance of 90:10 % the imbalanced learn methods were often more performant than standard tree-based classifiers. No one classification algorithm consistently out-performed all others and our approach emphasises a standardised, automated build and evaluate approach across all classifiers to identify the best model. The application of vEXP includes ranking of hit compounds for fast prioritisation, flagging of hit series that contain systematic scaffold or functional group related risks and the confirmation that late-stage optimisation is not introducing new off-target activities in established chemical series.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

vEXP：虚拟增强型交叉筛选面板，用于检测脱靶药理学警报

我们介绍了葛兰素史克 vEXP（虚拟增强交叉筛选面板）脱靶药理学警报的开发情况。用于脱靶安全性评估的机器学习分类模型或 QSAR 模型（定量结构-活性关系）面板的推导允许对候选药物中的风险因素进行早期预警。这些模型与之前描述的内部体外生化筛选面板相匹配，并在此报告了一些更新。在考虑与体外筛选相关的效力阈值时，显示了 GSK 某些内部数据集和大多数相关外部 ChEMBL 数据集的极端不平衡。许多 ChEMBL 数据集由于规模较小且偏向活性类别，因此无法使用此类阈值进行建模。虽然 GSK 数据集的规模较大，但许多数据集仍然过于不平衡，无法提供性能良好的模型。合并内部和外部数据有助于重新平衡数据集和改进适用范围，这一点在合并数据的模型性能改进中得到了证实。与单纯增加数据集规模相比，努力整理公共数据集以更好地平衡缺失的内生变量可能更有助于改进开源模型。我们研究了使用移动概率阈值和应用不平衡学习器来帮助克服不平衡问题。当应用于不平衡数据集时，这两种方法都能产生性能更好的模型。类不平衡度为 95:5 % 或含有 100 个化合物的数据集无法建模。当数据集的类不平衡度为 90:10 % 时，不平衡学习方法的性能往往高于基于树的标准分类器。没有一种分类算法的性能始终优于所有其他算法，我们的方法强调在所有分类器中采用标准化的自动构建和评估方法，以确定最佳模型。vEXP 的应用包括对命中化合物进行快速优先排序、标记含有系统性支架或官能团相关风险的命中系列，以及确认后期优化不会在已确立的化学系列中引入新的脱靶活性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computational Toxicology Computer Science-Computer Science Applications

CiteScore

5.50

自引率

0.00%

发文量

审稿时长

56 days

期刊介绍： Computational Toxicology is an international journal publishing computational approaches that assist in the toxicological evaluation of new and existing chemical substances assisting in their safety assessment. -All effects relating to human health and environmental toxicity and fate -Prediction of toxicity, metabolism, fate and physico-chemical properties -The development of models from read-across, (Q)SARs, PBPK, QIVIVE, Multi-Scale Models -Big Data in toxicology: integration, management, analysis -Implementation of models through AOPs, IATA, TTC -Regulatory acceptance of models: evaluation, verification and validation -From metals, to small organic molecules to nanoparticles -Pharmaceuticals, pesticides, foods, cosmetics, fine chemicals -Bringing together the views of industry, regulators, academia, NGOs