{"title":"用正交初始化方法生长神经网络","authors":"Xinglin Pan","doi":"10.1117/12.2667654","DOIUrl":null,"url":null,"abstract":"In the training of neural networks, the architecture is usually determined first and then the parameters are selected by an optimizer. The choice of architecture and parameters is often independent. Whenever the architecture is modified, an expensive retraining of the parameters is required. In this work, we focus on growing the architecture instead of the expensive retraining. There are two main ways to grow new neurons: splitting and adding. In this paper, we propose orthogonal initialization to mitigate the gradient vanish of the new adding neurons. We use QR decomposition to obtain orthogonal initialization. We performed detailed experiments on two datasets (CIFAR-10, CIFAR-100) and the experimental results show the efficiency of our method.","PeriodicalId":128051,"journal":{"name":"Third International Seminar on Artificial Intelligence, Networking, and Information Technology","volume":"12587 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Growing neural networks using orthogonal initialization\",\"authors\":\"Xinglin Pan\",\"doi\":\"10.1117/12.2667654\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the training of neural networks, the architecture is usually determined first and then the parameters are selected by an optimizer. The choice of architecture and parameters is often independent. Whenever the architecture is modified, an expensive retraining of the parameters is required. In this work, we focus on growing the architecture instead of the expensive retraining. There are two main ways to grow new neurons: splitting and adding. In this paper, we propose orthogonal initialization to mitigate the gradient vanish of the new adding neurons. We use QR decomposition to obtain orthogonal initialization. We performed detailed experiments on two datasets (CIFAR-10, CIFAR-100) and the experimental results show the efficiency of our method.\",\"PeriodicalId\":128051,\"journal\":{\"name\":\"Third International Seminar on Artificial Intelligence, Networking, and Information Technology\",\"volume\":\"12587 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Third International Seminar on Artificial Intelligence, Networking, and Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2667654\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Third International Seminar on Artificial Intelligence, Networking, and Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2667654","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Growing neural networks using orthogonal initialization
In the training of neural networks, the architecture is usually determined first and then the parameters are selected by an optimizer. The choice of architecture and parameters is often independent. Whenever the architecture is modified, an expensive retraining of the parameters is required. In this work, we focus on growing the architecture instead of the expensive retraining. There are two main ways to grow new neurons: splitting and adding. In this paper, we propose orthogonal initialization to mitigate the gradient vanish of the new adding neurons. We use QR decomposition to obtain orthogonal initialization. We performed detailed experiments on two datasets (CIFAR-10, CIFAR-100) and the experimental results show the efficiency of our method.