{"title":"Improving Convolutional Neural Network Using Pseudo Derivative ReLU","authors":"Zheng Hu, Yongping Li, Zhiyong Yang","doi":"10.1109/ICSAI.2018.8599372","DOIUrl":null,"url":null,"abstract":"Rectified linear unit (ReLU) is a widely used activation function in artificial neural networks, it is considered to be an efficient active function benefit from its simplicity and nonlinearity. However, ReLU’s derivative for negative inputs is zero, which can make some ReLUs inactive for essentially all inputs during the training. There are several ReLU variations for solving this problem. Comparing with ReLU, they are slightly different in form, and bring other drawbacks like more expensive in computation. In this study, pseudo derivatives were tried replacing original derivative of ReLU while ReLU itself was unchanged. The pseudo derivative was designed to alleviate the zero derivative problem and be consistent with original derivative in general. Experiments showed using pseudo derivative ReLU (PD-ReLU) could obviously improve AlexNet (a typical convolutional neural network model) in CIFAR-10 and CIFAR-100 tests. Furthermore, some empirical criteria for designing such pseudo derivatives were proposed.","PeriodicalId":375852,"journal":{"name":"2018 5th International Conference on Systems and Informatics (ICSAI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 5th International Conference on Systems and Informatics (ICSAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSAI.2018.8599372","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16
Abstract
Rectified linear unit (ReLU) is a widely used activation function in artificial neural networks, it is considered to be an efficient active function benefit from its simplicity and nonlinearity. However, ReLU’s derivative for negative inputs is zero, which can make some ReLUs inactive for essentially all inputs during the training. There are several ReLU variations for solving this problem. Comparing with ReLU, they are slightly different in form, and bring other drawbacks like more expensive in computation. In this study, pseudo derivatives were tried replacing original derivative of ReLU while ReLU itself was unchanged. The pseudo derivative was designed to alleviate the zero derivative problem and be consistent with original derivative in general. Experiments showed using pseudo derivative ReLU (PD-ReLU) could obviously improve AlexNet (a typical convolutional neural network model) in CIFAR-10 and CIFAR-100 tests. Furthermore, some empirical criteria for designing such pseudo derivatives were proposed.