{"title":"Toxicity prediction using locality-sensitive deep learner","authors":"Xiu Huan Yap , Michael Raymer","doi":"10.1016/j.comtox.2021.100210","DOIUrl":null,"url":null,"abstract":"<div><p>Toxicity prediction using linear QSAR models typically show good predictivity when trained on a small-scale, local level of similar chemicals, but not on a global level spanning a chemical library. We hypothesize that large chemical toxicity datasets generally have a <em>locally-linear data</em> structure, and propose the <em>locality-sensitive deep learner</em> (LSDL), a deep neural network with attention mechanism <span>[1]</span> and an optional instance-based feature weighting component, to tackle the challenges of heterogeneous classification space with locally-varying noise features. On carefully-constructed synthetic data with extremely unbalanced classes (10% positive), the locality-sensitive deep learner with learned feature weights retained high test performance (AUC > 0.9) in the presence of 60% cluster-specific feature noise, while feed-forward neural network appeared to over-fit the data (AUC < 0.6). For the Tox21 dataset <span>[2]</span>, locality-sensitive deep learner out-performed feed-forward neural network in 9 out of 12 labels. For acetylcholinesterase inhibition (AChEi) <span>[3]</span>, Collaborative Modeling Project for Androgen Receptor Activity (CoMPARA) <span>[4]</span>, and Acute Oral Toxicity (AOT) <span>[5]</span> datasets, we observed that the combination of locality-sensitive deep learner with feed-forward neural network showed improved test performance than individual models in almost all cases. Generalizing machine learning models to fit locally-linear data may potentially improve predictivity of chemical toxicity models. The proposed modeling approach could potentially complement and add diversity to the current suite of predictive toxicity algorithms for use in ensemble and/or consensus models.</p></div>","PeriodicalId":37651,"journal":{"name":"Computational Toxicology","volume":"21 ","pages":"Article 100210"},"PeriodicalIF":3.1000,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Toxicology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468111321000566","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"TOXICOLOGY","Score":null,"Total":0}
引用次数: 2
Abstract
Toxicity prediction using linear QSAR models typically show good predictivity when trained on a small-scale, local level of similar chemicals, but not on a global level spanning a chemical library. We hypothesize that large chemical toxicity datasets generally have a locally-linear data structure, and propose the locality-sensitive deep learner (LSDL), a deep neural network with attention mechanism [1] and an optional instance-based feature weighting component, to tackle the challenges of heterogeneous classification space with locally-varying noise features. On carefully-constructed synthetic data with extremely unbalanced classes (10% positive), the locality-sensitive deep learner with learned feature weights retained high test performance (AUC > 0.9) in the presence of 60% cluster-specific feature noise, while feed-forward neural network appeared to over-fit the data (AUC < 0.6). For the Tox21 dataset [2], locality-sensitive deep learner out-performed feed-forward neural network in 9 out of 12 labels. For acetylcholinesterase inhibition (AChEi) [3], Collaborative Modeling Project for Androgen Receptor Activity (CoMPARA) [4], and Acute Oral Toxicity (AOT) [5] datasets, we observed that the combination of locality-sensitive deep learner with feed-forward neural network showed improved test performance than individual models in almost all cases. Generalizing machine learning models to fit locally-linear data may potentially improve predictivity of chemical toxicity models. The proposed modeling approach could potentially complement and add diversity to the current suite of predictive toxicity algorithms for use in ensemble and/or consensus models.
期刊介绍:
Computational Toxicology is an international journal publishing computational approaches that assist in the toxicological evaluation of new and existing chemical substances assisting in their safety assessment. -All effects relating to human health and environmental toxicity and fate -Prediction of toxicity, metabolism, fate and physico-chemical properties -The development of models from read-across, (Q)SARs, PBPK, QIVIVE, Multi-Scale Models -Big Data in toxicology: integration, management, analysis -Implementation of models through AOPs, IATA, TTC -Regulatory acceptance of models: evaluation, verification and validation -From metals, to small organic molecules to nanoparticles -Pharmaceuticals, pesticides, foods, cosmetics, fine chemicals -Bringing together the views of industry, regulators, academia, NGOs