Imbalanced Toxicity Prediction Using Multi-Task Learning and Over-Sampling

2020 International Conference on Machine Learning and Cybernetics (ICMLC) Pub Date : 2020-12-02 DOI:10.1109/ICMLC51923.2020.9469546

Jincheng Li

{"title":"Imbalanced Toxicity Prediction Using Multi-Task Learning and Over-Sampling","authors":"Jincheng Li","doi":"10.1109/ICMLC51923.2020.9469546","DOIUrl":null,"url":null,"abstract":"Chemical compound toxicity prediction is a challenge learning problem that the number of active chemicals obtained for toxicity assays are far smaller than the inactive chemicals, i.e. imbalanced data. Neural Networks learned from these tasks with imbalanced data tend to misclassify the minority samples into majority samples. In this paper, we propose a novel learning method that combine multi-task deep neural networks learning with over-sampling method to handle imbalanced data and lack of training data problems of toxicity prediction. Over-sampling is a kind of re-sampling method that tackle the class imbalance problem by replicating the minority class samples. For each toxicity prediction task, we apply over-sampling method on training set to generate synthetic samples of the minority class to balance the training data. Then, we train the multi-task deep neural network on the tasks with balanced training set. Multi-task learning can share common information among tasks and the balanced data set have larger number of training data that benefit the multi-task deep neural networks learning.Experiment results on tox21 toxicity prediction data set shows that our method significantly relieve imbalanced data problem of multi-task deep neural networks learning and outperforms multi-task deep neural network method that without over-sampling and many other computational approaches like support vector machine and random forests.","PeriodicalId":170815,"journal":{"name":"2020 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Machine Learning and Cybernetics (ICMLC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLC51923.2020.9469546","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Chemical compound toxicity prediction is a challenge learning problem that the number of active chemicals obtained for toxicity assays are far smaller than the inactive chemicals, i.e. imbalanced data. Neural Networks learned from these tasks with imbalanced data tend to misclassify the minority samples into majority samples. In this paper, we propose a novel learning method that combine multi-task deep neural networks learning with over-sampling method to handle imbalanced data and lack of training data problems of toxicity prediction. Over-sampling is a kind of re-sampling method that tackle the class imbalance problem by replicating the minority class samples. For each toxicity prediction task, we apply over-sampling method on training set to generate synthetic samples of the minority class to balance the training data. Then, we train the multi-task deep neural network on the tasks with balanced training set. Multi-task learning can share common information among tasks and the balanced data set have larger number of training data that benefit the multi-task deep neural networks learning.Experiment results on tox21 toxicity prediction data set shows that our method significantly relieve imbalanced data problem of multi-task deep neural networks learning and outperforms multi-task deep neural network method that without over-sampling and many other computational approaches like support vector machine and random forests.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用多任务学习和过度抽样的不平衡毒性预测

化学化合物毒性预测是一个具有挑战性的学习问题，因为用于毒性分析的活性化学物质的数量远远少于非活性化学物质，即数据不平衡。从这些具有不平衡数据的任务中学习的神经网络往往会将少数样本错误地分类为多数样本。本文提出了一种将多任务深度神经网络学习与过采样方法相结合的学习方法，以解决毒性预测中数据不平衡和训练数据缺乏的问题。过度抽样是一种通过复制少数类样本来解决类不平衡问题的重新抽样方法。对于每个毒性预测任务，我们在训练集上应用过采样方法生成少数类的合成样本来平衡训练数据。然后，我们在具有平衡训练集的任务上训练多任务深度神经网络。多任务学习可以在任务之间共享公共信息，平衡数据集有更多的训练数据，有利于多任务深度神经网络的学习。在tox21毒性预测数据集上的实验结果表明，该方法显著缓解了多任务深度神经网络学习的数据不平衡问题，优于无过采样的多任务深度神经网络方法以及支持向量机、随机森林等多种计算方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 International Conference on Machine Learning and Cybernetics (ICMLC)

自引率

0.00%

发文量

期刊最新文献

Behavioral Decision Makings: Reconciling Behavioral Economics and Decision Systems Operating System Classification: A Minimalist Approach Research on Hotspot Mining Method of Twitter News Report Based on LDA and Sentiment Analysis Conservative Generalisation for Small Data Analytics –An Extended Lattice Machine Approach ICMLC 2020 Cover Page