{"title":"使用多任务学习和过度抽样的不平衡毒性预测","authors":"Jincheng Li","doi":"10.1109/ICMLC51923.2020.9469546","DOIUrl":null,"url":null,"abstract":"Chemical compound toxicity prediction is a challenge learning problem that the number of active chemicals obtained for toxicity assays are far smaller than the inactive chemicals, i.e. imbalanced data. Neural Networks learned from these tasks with imbalanced data tend to misclassify the minority samples into majority samples. In this paper, we propose a novel learning method that combine multi-task deep neural networks learning with over-sampling method to handle imbalanced data and lack of training data problems of toxicity prediction. Over-sampling is a kind of re-sampling method that tackle the class imbalance problem by replicating the minority class samples. For each toxicity prediction task, we apply over-sampling method on training set to generate synthetic samples of the minority class to balance the training data. Then, we train the multi-task deep neural network on the tasks with balanced training set. Multi-task learning can share common information among tasks and the balanced data set have larger number of training data that benefit the multi-task deep neural networks learning.Experiment results on tox21 toxicity prediction data set shows that our method significantly relieve imbalanced data problem of multi-task deep neural networks learning and outperforms multi-task deep neural network method that without over-sampling and many other computational approaches like support vector machine and random forests.","PeriodicalId":170815,"journal":{"name":"2020 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Imbalanced Toxicity Prediction Using Multi-Task Learning and Over-Sampling\",\"authors\":\"Jincheng Li\",\"doi\":\"10.1109/ICMLC51923.2020.9469546\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Chemical compound toxicity prediction is a challenge learning problem that the number of active chemicals obtained for toxicity assays are far smaller than the inactive chemicals, i.e. imbalanced data. Neural Networks learned from these tasks with imbalanced data tend to misclassify the minority samples into majority samples. In this paper, we propose a novel learning method that combine multi-task deep neural networks learning with over-sampling method to handle imbalanced data and lack of training data problems of toxicity prediction. Over-sampling is a kind of re-sampling method that tackle the class imbalance problem by replicating the minority class samples. For each toxicity prediction task, we apply over-sampling method on training set to generate synthetic samples of the minority class to balance the training data. Then, we train the multi-task deep neural network on the tasks with balanced training set. Multi-task learning can share common information among tasks and the balanced data set have larger number of training data that benefit the multi-task deep neural networks learning.Experiment results on tox21 toxicity prediction data set shows that our method significantly relieve imbalanced data problem of multi-task deep neural networks learning and outperforms multi-task deep neural network method that without over-sampling and many other computational approaches like support vector machine and random forests.\",\"PeriodicalId\":170815,\"journal\":{\"name\":\"2020 International Conference on Machine Learning and Cybernetics (ICMLC)\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Machine Learning and Cybernetics (ICMLC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLC51923.2020.9469546\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Machine Learning and Cybernetics (ICMLC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLC51923.2020.9469546","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Imbalanced Toxicity Prediction Using Multi-Task Learning and Over-Sampling
Chemical compound toxicity prediction is a challenge learning problem that the number of active chemicals obtained for toxicity assays are far smaller than the inactive chemicals, i.e. imbalanced data. Neural Networks learned from these tasks with imbalanced data tend to misclassify the minority samples into majority samples. In this paper, we propose a novel learning method that combine multi-task deep neural networks learning with over-sampling method to handle imbalanced data and lack of training data problems of toxicity prediction. Over-sampling is a kind of re-sampling method that tackle the class imbalance problem by replicating the minority class samples. For each toxicity prediction task, we apply over-sampling method on training set to generate synthetic samples of the minority class to balance the training data. Then, we train the multi-task deep neural network on the tasks with balanced training set. Multi-task learning can share common information among tasks and the balanced data set have larger number of training data that benefit the multi-task deep neural networks learning.Experiment results on tox21 toxicity prediction data set shows that our method significantly relieve imbalanced data problem of multi-task deep neural networks learning and outperforms multi-task deep neural network method that without over-sampling and many other computational approaches like support vector machine and random forests.