{"title":"人工神经网络学习的信息论观点","authors":"E. Balda, A. Behboodi, R. Mathar","doi":"10.1109/ICSPCS.2018.8631758","DOIUrl":null,"url":null,"abstract":"Deep learning based on Artificial Neural Networks (ANNs) has achieved great successes over the last years. However, gaining insight into the fundamentals and explaining their functionality is an open research area of high interest. In this paper, we use an information theoretic approach to reveal typical learning patterns of ANNs. For this purpose the training samples, the true labels, and the estimated labels are considered as random variables. Then, the mutual information and conditional entropy between these variables are studied. We show that the learning process of ANNs consists of essentially two phases. First, the network learns mostly about the input samples without significant improvement in the accuracy, thereafter the correct class allocation becomes more pronounced. This is based on investigating the conditional entropy of the estimated class label given the true one in the course of training. We next derive bounds on the conditional entropy as a function of the error probability, which provide interesting insights into the learning behavior of ANNs. Theoretical investigations are accompanied by extensive numerical studies on an artificial data set as well as the MNIST and CIFAR benchmark data using the widely known networks LeNet-5 and DenseNet. Amazingly, in all cases the bounds are nearly attained in later stages of the training phase, which allows for an analytical measure of the training status of an ANN.","PeriodicalId":179948,"journal":{"name":"2018 12th International Conference on Signal Processing and Communication Systems (ICSPCS)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"An Information Theoretic View on Learning of Artificial Neural Networks\",\"authors\":\"E. Balda, A. Behboodi, R. Mathar\",\"doi\":\"10.1109/ICSPCS.2018.8631758\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning based on Artificial Neural Networks (ANNs) has achieved great successes over the last years. However, gaining insight into the fundamentals and explaining their functionality is an open research area of high interest. In this paper, we use an information theoretic approach to reveal typical learning patterns of ANNs. For this purpose the training samples, the true labels, and the estimated labels are considered as random variables. Then, the mutual information and conditional entropy between these variables are studied. We show that the learning process of ANNs consists of essentially two phases. First, the network learns mostly about the input samples without significant improvement in the accuracy, thereafter the correct class allocation becomes more pronounced. This is based on investigating the conditional entropy of the estimated class label given the true one in the course of training. We next derive bounds on the conditional entropy as a function of the error probability, which provide interesting insights into the learning behavior of ANNs. Theoretical investigations are accompanied by extensive numerical studies on an artificial data set as well as the MNIST and CIFAR benchmark data using the widely known networks LeNet-5 and DenseNet. Amazingly, in all cases the bounds are nearly attained in later stages of the training phase, which allows for an analytical measure of the training status of an ANN.\",\"PeriodicalId\":179948,\"journal\":{\"name\":\"2018 12th International Conference on Signal Processing and Communication Systems (ICSPCS)\",\"volume\":\"82 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 12th International Conference on Signal Processing and Communication Systems (ICSPCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSPCS.2018.8631758\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 12th International Conference on Signal Processing and Communication Systems (ICSPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSPCS.2018.8631758","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Information Theoretic View on Learning of Artificial Neural Networks
Deep learning based on Artificial Neural Networks (ANNs) has achieved great successes over the last years. However, gaining insight into the fundamentals and explaining their functionality is an open research area of high interest. In this paper, we use an information theoretic approach to reveal typical learning patterns of ANNs. For this purpose the training samples, the true labels, and the estimated labels are considered as random variables. Then, the mutual information and conditional entropy between these variables are studied. We show that the learning process of ANNs consists of essentially two phases. First, the network learns mostly about the input samples without significant improvement in the accuracy, thereafter the correct class allocation becomes more pronounced. This is based on investigating the conditional entropy of the estimated class label given the true one in the course of training. We next derive bounds on the conditional entropy as a function of the error probability, which provide interesting insights into the learning behavior of ANNs. Theoretical investigations are accompanied by extensive numerical studies on an artificial data set as well as the MNIST and CIFAR benchmark data using the widely known networks LeNet-5 and DenseNet. Amazingly, in all cases the bounds are nearly attained in later stages of the training phase, which allows for an analytical measure of the training status of an ANN.