{"title":"关于反向传播学习的块梯度算法的收敛性","authors":"H. Paugam-Moisy","doi":"10.1109/IJCNN.1992.227082","DOIUrl":null,"url":null,"abstract":"A block-gradient algorithm is defined, where the weight matrix is updated after every presentation of a block of b examples each. Total and stochastic gradients are included in the block-gradient algorithm, for particular values of b. Experimental laws are stated on the speed of convergence, according to the block size. The first law indicates that an adaptive learning rate has to respect an exponential decreasing function of the number of examples presented between two successive weight updates. The second law states that, with an adaptive learning rate value, the number of epochs grows linearly with the size of the exemplar blocks. The last one shows how the number of epochs for reaching a given level of performance depends on the learning rate.<<ETX>>","PeriodicalId":286849,"journal":{"name":"[Proceedings 1992] IJCNN International Joint Conference on Neural Networks","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1992-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"On the convergence of a block-gradient algorithm for back-propagation learning\",\"authors\":\"H. Paugam-Moisy\",\"doi\":\"10.1109/IJCNN.1992.227082\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A block-gradient algorithm is defined, where the weight matrix is updated after every presentation of a block of b examples each. Total and stochastic gradients are included in the block-gradient algorithm, for particular values of b. Experimental laws are stated on the speed of convergence, according to the block size. The first law indicates that an adaptive learning rate has to respect an exponential decreasing function of the number of examples presented between two successive weight updates. The second law states that, with an adaptive learning rate value, the number of epochs grows linearly with the size of the exemplar blocks. The last one shows how the number of epochs for reaching a given level of performance depends on the learning rate.<<ETX>>\",\"PeriodicalId\":286849,\"journal\":{\"name\":\"[Proceedings 1992] IJCNN International Joint Conference on Neural Networks\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1992-06-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"[Proceedings 1992] IJCNN International Joint Conference on Neural Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IJCNN.1992.227082\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"[Proceedings 1992] IJCNN International Joint Conference on Neural Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.1992.227082","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
On the convergence of a block-gradient algorithm for back-propagation learning
A block-gradient algorithm is defined, where the weight matrix is updated after every presentation of a block of b examples each. Total and stochastic gradients are included in the block-gradient algorithm, for particular values of b. Experimental laws are stated on the speed of convergence, according to the block size. The first law indicates that an adaptive learning rate has to respect an exponential decreasing function of the number of examples presented between two successive weight updates. The second law states that, with an adaptive learning rate value, the number of epochs grows linearly with the size of the exemplar blocks. The last one shows how the number of epochs for reaching a given level of performance depends on the learning rate.<>