Pub Date : 2019-03-01DOI: 10.1142/9789811201233_0004
Volker Tresp
The perceptron implements a binary classifier f : R D → {+1, −1} with a linear decision surface through the origin: f (x) = step(θ x). (1) where step(z) = 1 if z ≥ 0 −1 otherwise. Using the zero-one loss L(y, f (x)) = 0 if y = f (x) 1 otherwise, the empirical risk of the perceptron on training data S = 1. The problem with this is that R emp (θ) is not differentiable in θ, so we cannot do gradient descent to learn θ. To circumvent this, we use the modified empirical loss R emp (θ) = i∈(1,2,...,N) : yi =step θ T xi −y i θ T x i. (2) This just says that correctly classified examples don't incur any loss at all, while incorrectly classified examples contribute θ T x i , which is some sort of measure of confidence in the (incorrect) labeling. 1 We can now use gradient descent to learn θ. Starting from an arbitrary θ (0) , we update our parameter vector according to θ (t+1) = θ (t) − η∇ θ R| θ (t) , where η, called the learning rate, is a parameter of our choosing. The gradient of (2) is again a sum over the misclassified examples: ∇ θ R emp (θ) = 1 A slightly more principled way to look at this is to derive this modified risk from the hinge loss L(y, θ T x) = max 0, −y θ T x .
感知器实现了一个二元分类器f: R D→{+1,−1},其线性决策曲面通过原点:f (x) = step(θ x).(1)其中,如果z≥0−1,则step(z) = 1。使用0 - 1损失,如果y = f (x) 1,则L(y, f (x)) = 0,否则,感知器在训练数据上的经验风险S = 1。这里的问题是remp (θ)在θ下不可导,所以我们不能用梯度下降来学习θ。为了避免这种情况,我们使用修正的经验损失R emp (θ) = i∈(1,2,…,N): yi =步长θ T xi - yi θ T xi。(2)这只是说正确分类的示例根本不会产生任何损失,而错误分类的示例贡献θ T xi,这是对(不正确)标记的某种置信度度量。我们现在可以用梯度下降来学习θ。从任意一个θ(0)开始,我们根据θ (t+1) = θ (t)−η∇θ R| θ (t)来更新我们的参数向量,其中η称为学习率,是我们选择的一个参数。(2)的梯度也是对错误分类的例子求和:∇θ R emp (θ) = 1一个更有原则的方法是,从铰链损失L(y, θ T x) = max 0, - y θ T x推导出修正后的风险。
{"title":"The Perceptron","authors":"Volker Tresp","doi":"10.1142/9789811201233_0004","DOIUrl":"https://doi.org/10.1142/9789811201233_0004","url":null,"abstract":"The perceptron implements a binary classifier f : R D → {+1, −1} with a linear decision surface through the origin: f (x) = step(θ x). (1) where step(z) = 1 if z ≥ 0 −1 otherwise. Using the zero-one loss L(y, f (x)) = 0 if y = f (x) 1 otherwise, the empirical risk of the perceptron on training data S = 1. The problem with this is that R emp (θ) is not differentiable in θ, so we cannot do gradient descent to learn θ. To circumvent this, we use the modified empirical loss R emp (θ) = i∈(1,2,...,N) : yi =step θ T xi −y i θ T x i. (2) This just says that correctly classified examples don't incur any loss at all, while incorrectly classified examples contribute θ T x i , which is some sort of measure of confidence in the (incorrect) labeling. 1 We can now use gradient descent to learn θ. Starting from an arbitrary θ (0) , we update our parameter vector according to θ (t+1) = θ (t) − η∇ θ R| θ (t) , where η, called the learning rate, is a parameter of our choosing. The gradient of (2) is again a sum over the misclassified examples: ∇ θ R emp (θ) = 1 A slightly more principled way to look at this is to derive this modified risk from the hinge loss L(y, θ T x) = max 0, −y θ T x .","PeriodicalId":188131,"journal":{"name":"Principles of Artificial Neural Networks","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132263914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-01DOI: 10.1142/9789811201233_0016
{"title":"Performance of DLNN — Comparative Case Studies","authors":"","doi":"10.1142/9789811201233_0016","DOIUrl":"https://doi.org/10.1142/9789811201233_0016","url":null,"abstract":"","PeriodicalId":188131,"journal":{"name":"Principles of Artificial Neural Networks","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134144132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-01DOI: 10.1142/9789811201233_0006
Barak Oshri, Vincent Chen, Nish Khandwala, Yi Wen, TA Yi Wen
{"title":"Back Propagation","authors":"Barak Oshri, Vincent Chen, Nish Khandwala, Yi Wen, TA Yi Wen","doi":"10.1142/9789811201233_0006","DOIUrl":"https://doi.org/10.1142/9789811201233_0006","url":null,"abstract":"","PeriodicalId":188131,"journal":{"name":"Principles of Artificial Neural Networks","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122836414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}