{"title":"GPIL: Gradient with PseudoInverse Learning for High Accuracy Fine-Tuning","authors":"Gilha Lee, N. Kim, Hyun Kim","doi":"10.1109/AICAS57966.2023.10168584","DOIUrl":null,"url":null,"abstract":"PseudoInverse learning (PIL) is proposed to increase the convergence speed of conventional gradient descent. PIL can be trained with fast and reliable convolutional neural networks (CNNs) without a gradient using a pseudoinverse matrix. However, PIL has several problems when training a network. First, there is an out-of-memory problem because all batches are required during one epoch of training. Second, the network cannot be deeper because more unreliable input pseudoinverse matrices are used as the deeper PIL layer is stacked. Therefore, PIL has not yet been effectively applied to widely used deep models. Inspired by the limitation of the existing PIL, we propose a novel error propagation methodology that allows the fine-tuning process, which is often used in a resource-constrained environment, to be performed more accurately. In detail, by using both PIL and gradient descent, we not only enable mini-batch training, which was impossible in PIL, but also achieve higher accuracy through more accurate error propagation. Moreover, unlike the existing PIL, which uses only the pseudoinverse matrix of the CNN input, we additionally use the pseudoinverse matrix of weights to compensate for the limitations of PIL; thus, the proposed method enables faster and more accurate error propagation in the CNN training process. As a result, it is efficient for fine-tuning in resource-constrained environments, such as mobile/edge devices that require an accuracy comparable to small training epochs. Experimental results show that the proposed method improves the accuracy after ResNet-101 fine-tuning on the CIFAR-100 dataset by 2.78% compared to the baseline.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICAS57966.2023.10168584","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
PseudoInverse learning (PIL) is proposed to increase the convergence speed of conventional gradient descent. PIL can be trained with fast and reliable convolutional neural networks (CNNs) without a gradient using a pseudoinverse matrix. However, PIL has several problems when training a network. First, there is an out-of-memory problem because all batches are required during one epoch of training. Second, the network cannot be deeper because more unreliable input pseudoinverse matrices are used as the deeper PIL layer is stacked. Therefore, PIL has not yet been effectively applied to widely used deep models. Inspired by the limitation of the existing PIL, we propose a novel error propagation methodology that allows the fine-tuning process, which is often used in a resource-constrained environment, to be performed more accurately. In detail, by using both PIL and gradient descent, we not only enable mini-batch training, which was impossible in PIL, but also achieve higher accuracy through more accurate error propagation. Moreover, unlike the existing PIL, which uses only the pseudoinverse matrix of the CNN input, we additionally use the pseudoinverse matrix of weights to compensate for the limitations of PIL; thus, the proposed method enables faster and more accurate error propagation in the CNN training process. As a result, it is efficient for fine-tuning in resource-constrained environments, such as mobile/edge devices that require an accuracy comparable to small training epochs. Experimental results show that the proposed method improves the accuracy after ResNet-101 fine-tuning on the CIFAR-100 dataset by 2.78% compared to the baseline.