J. Mach. Learn. Res.最新文献

英文中文

End-to-End Training of Deep Visuomotor Policies 深度视觉运动策略的端到端训练

J. Mach. Learn. Res.

Pub Date : 2015-04-02 DOI: 10.5555/2946645.2946684

S. Levine, Chelsea Finn, Trevor Darrell, P. Abbeel

Policy search methods can allow robots to learn control policies for a wide range of tasks, but practical applications of policy search often require hand-engineered components for perception, state estimation, and low-level control. In this paper, we aim to answer the following question: does training the perception and control systems jointly end-to-end provide better performance than training each component separately? To this end, we develop a method that can be used to learn policies that map raw image observations directly to torques at the robot's motors. The policies are represented by deep convolutional neural networks (CNNs) with 92,000 parameters, and are trained using a partially observed guided policy search method, which transforms policy search into supervised learning, with supervision provided by a simple trajectory-centric reinforcement learning method. We evaluate our method on a range of real-world manipulation tasks that require close coordination between vision and control, such as screwing a cap onto a bottle, and present simulated comparisons to a range of prior policy search methods.

策略搜索方法可以让机器人学习各种任务的控制策略，但策略搜索的实际应用通常需要手工设计用于感知、状态估计和低级控制的组件。在本文中，我们的目标是回答以下问题:端到端联合训练感知和控制系统是否比单独训练每个组件提供更好的性能?为此，我们开发了一种方法，可用于学习将原始图像观测直接映射到机器人电机扭矩的策略。策略由具有92,000个参数的深度卷积神经网络(cnn)表示，并使用部分观察引导策略搜索方法进行训练，该方法将策略搜索转换为监督学习，并通过简单的以轨迹为中心的强化学习方法提供监督。我们在一系列现实世界的操作任务中评估了我们的方法，这些任务需要视觉和控制之间的密切协调，例如将瓶盖拧到瓶子上，并与一系列先前的政策搜索方法进行了模拟比较。

引用次数: 3012

The Libra toolkit for probabilistic models Libra的概率模型工具包

J. Mach. Learn. Res.

Pub Date : 2015-03-31 DOI: 10.5555/2789272.2912077

Daniel Lowd, Pedram Rooshenas

The Libra Toolkit is a collection of algorithms for learning and inference with discrete probabilistic models, including Bayesian networks, Markov networks, dependency networks, and sum-product networks. Compared to other toolkits, Libra places a greater emphasis on learning the structure of tractable models in which exact inference is efficient. It also includes a variety of algorithms for learning graphical models in which inference is potentially intractable, and for performing exact and approximate inference. Libra is released under a 2-clause BSD license to encourage broad use in academia and industry.

Libra工具包是一组用于学习和推理离散概率模型的算法集合，包括贝叶斯网络、马尔可夫网络、依赖网络和和积网络。与其他工具包相比，Libra更强调学习可处理模型的结构，其中精确的推理是有效的。它还包括用于学习图形模型的各种算法，其中推理可能难以处理，以及用于执行精确和近似推理。Libra是在2条款BSD许可下发布的，以鼓励在学术界和工业界广泛使用。

引用次数: 30

Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm 多数投票的风险界限:从pac -贝叶斯分析到学习算法

J. Mach. Learn. Res.

Pub Date : 2015-03-28 DOI: 10.5555/2789272.2831140

Pascal Germain, A. Lacasse, François Laviolette, M. Marchand, Jean-Francis Roy

We propose an extensive analysis of the behavior of majority votes in binary classification. In particular, we introduce a risk bound for majority votes, called the C-bound, that takes into account the average quality of the voters and their average disagreement. We also propose an extensive PAC-Bayesian analysis that shows how the C-bound can be estimated from various observations contained in the training data. The analysis intends to be self-contained and can be used as introductory material to PAC-Bayesian statistical learning theory. It starts from a general PAC-Bayesian perspective and ends with uncommon PAC-Bayesian bounds. Some of these bounds contain no Kullback-Leibler divergence and others allow kernel functions to be used as voters (via the sample compression setting). Finally, out of the analysis, we propose the MinCq learning algorithm that basically minimizes the C-bound. MinCq reduces to a simple quadratic program. Aside from being theoretically grounded, MinCq achieves state-of-the-art performance, as shown in our extensive empirical comparison with both AdaBoost and the Support Vector Machine.

我们提出了一个广泛的分析在二元分类多数投票的行为。特别是，我们引入了多数票的风险界限，称为c界限，它考虑了选民的平均素质和他们的平均分歧。我们还提出了一个广泛的pac -贝叶斯分析，该分析显示了如何从训练数据中包含的各种观测值中估计c界。分析打算是独立的，可以作为pac -贝叶斯统计学习理论的入门材料。它从一般的PAC-Bayesian角度出发，以不常见的PAC-Bayesian边界结束。其中一些边界不包含Kullback-Leibler散度，而其他边界允许将核函数用作投票人(通过样本压缩设置)。最后，在分析的基础上，我们提出了MinCq学习算法，该算法基本上最小化了c界。MinCq简化为一个简单的二次程序。除了理论基础之外，MinCq实现了最先进的性能，正如我们与AdaBoost和支持向量机的广泛经验比较所示。

{"title":"Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm","authors":"Pascal Germain, A. Lacasse, François Laviolette, M. Marchand, Jean-Francis Roy","doi":"10.5555/2789272.2831140","DOIUrl":"https://doi.org/10.5555/2789272.2831140","url":null,"abstract":"We propose an extensive analysis of the behavior of majority votes in binary classification. In particular, we introduce a risk bound for majority votes, called the C-bound, that takes into account the average quality of the voters and their average disagreement. We also propose an extensive PAC-Bayesian analysis that shows how the C-bound can be estimated from various observations contained in the training data. The analysis intends to be self-contained and can be used as introductory material to PAC-Bayesian statistical learning theory. It starts from a general PAC-Bayesian perspective and ends with uncommon PAC-Bayesian bounds. Some of these bounds contain no Kullback-Leibler divergence and others allow kernel functions to be used as voters (via the sample compression setting). Finally, out of the analysis, we propose the MinCq learning algorithm that basically minimizes the C-bound. MinCq reduces to a simple quadratic program. Aside from being theoretically grounded, MinCq achieves state-of-the-art performance, as shown in our extensive empirical comparison with both AdaBoost and the Support Vector Machine.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"81 1","pages":"787-860"},"PeriodicalIF":0.0,"publicationDate":"2015-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85481769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 118

Existence and uniqueness of proper scoring rules 适当计分规则的存在性和唯一性

J. Mach. Learn. Res.

Pub Date : 2015-02-04 DOI: 10.5555/2789272.2886820

E. Ovcharov

To discuss the existence and uniqueness of proper scoring rules one needs to extend the associated entropy functions as sublinear functions to the conic hull of the prediction set. In some natural function spaces, such as the Lebesgue Lp-spaces over Rd, the positive cones have empty interior. Entropy functions defined on such cones have directional derivatives only, which typically exist on large subspaces and behave similarly to gradients. Certain entropies may be further extended continuously to open cones in normed spaces containing signed densities. The extended densities are Gâteaux differentiable except on a negligible set and have everywhere continuous subgradients due to the supporting hyperplane theorem. We introduce the necessary framework from analysis and algebra that allows us to give an affirmative answer to the titular question of the paper. As a result of this, we give a formal sense in which entropy functions have uniquely associated proper scoring rules. We illustrate our framework by studying the derivatives and subgradients of the following three prototypical entropies: Shannon entropy, Hyvarinen entropy, and quadratic entropy.

为了讨论适当评分规则的存在唯一性，需要将相关熵函数作为次线性函数推广到预测集的二次壳上。在一些自然函数空间中，如在Rd上的Lebesgue lp空间中，正锥的内部是空的。在这样的锥上定义的熵函数只有方向导数，它通常存在于大的子空间上，并且行为类似于梯度。某些熵可以进一步连续扩展到包含有符号密度的赋范空间中的开锥。由于超平面定理的支持，扩展密度除在一个可忽略集上外都是格特奥可微的，并且处处具有连续的次梯度。我们从分析和代数中引入必要的框架，使我们能够对论文的名义问题给出肯定的答案。因此，我们给出了熵函数具有唯一关联的适当评分规则的形式意义。我们通过研究以下三个原型熵的导数和次梯度来说明我们的框架:Shannon熵、Hyvarinen熵和二次熵。

引用次数: 5

Photonic delay systems as machine learning implementations 作为机器学习实现的光子延迟系统

J. Mach. Learn. Res.

Pub Date : 2015-01-12 DOI: 10.5555/2789272.2886817

Michiel Hermans, M. C. Soriano, J. Dambre, P. Bienstman, Ingo Fischer

Nonlinear photonic delay systems present interesting implementation platforms for machine learning models. They can be extremely fast, offer great degrees of parallelism and potentially consume far less power than digital processors. So far they have been successfully employed for signal processing using the Reservoir Computing paradigm. In this paper we show that their range of applicability can be greatly extended if we use gradient descent with backpropagation through time on a model of the system to optimize the input encoding of such systems. We perform physical experiments that demonstrate that the obtained input encodings work well in reality, and we show that optimized systems perform significantly better than the common Reservoir Computing approach. The results presented here demonstrate that common gradient descent techniques from machine learning may well be applicable on physical neuro-inspired analog computers.

非线性光子延迟系统为机器学习模型提供了有趣的实现平台。它们可以非常快，提供高度的并行性，并且可能比数字处理器消耗更少的功率。到目前为止，它们已经成功地用于使用油藏计算范式进行信号处理。在本文中，我们证明了如果我们在系统的模型上使用随时间反向传播的梯度下降来优化这类系统的输入编码，则可以大大扩展它们的适用范围。我们进行了物理实验，证明了所获得的输入编码在现实中工作得很好，并且我们表明优化系统的性能明显优于普通的油藏计算方法。本文给出的结果表明，来自机器学习的常见梯度下降技术可能很好地适用于物理神经启发的模拟计算机。

引用次数: 43

Discrete reproducing kernel Hilbert spaces: sampling and distribution of Dirac-masses 离散再现核希尔伯特空间:狄拉克质量的抽样和分布

J. Mach. Learn. Res.

Pub Date : 2015-01-10 DOI: 10.5555/2789272.2912098

P. Jorgensen, Feng Tian

We study reproducing kernels, and associated reproducing kernel Hilbert spaces (RKHSs) $mathscr{H}$ over infinite, discrete and countable sets $V$. In this setting we analyze in detail the distributions of the corresponding Dirac point-masses of $V$. Illustrations include certain models from neural networks: An Extreme Learning Machine (ELM) is a neural network-configuration in which a hidden layer of weights are randomly sampled, and where the object is then to compute resulting output. For RKHSs $mathscr{H}$ of functions defined on a prescribed countable infinite discrete set $V$, we characterize those which contain the Dirac masses $delta_{x}$ for all points $x$ in $V$. Further examples and applications where this question plays an important role are: (i) discrete Brownian motion-Hilbert spaces, i.e., discrete versions of the Cameron-Martin Hilbert space; (ii) energy-Hilbert spaces corresponding to graph-Laplacians where the set $V$ of vertices is then equipped with a resistance metric; and finally (iii) the study of Gaussian free fields.

我们研究了核的再现，以及相关的核希尔伯特空间(RKHSs) $mathscr{H}$在无限，离散和可数集合$V$上的再现。在这种情况下，我们详细地分析了V的相应狄拉克点质量的分布。示例包括来自神经网络的某些模型:极限学习机(ELM)是一种神经网络配置，其中随机采样隐藏的权重层，然后计算结果输出。对于定义在给定的可数无限离散集$V$上的函数的RKHSs $mathscr{H}$，我们刻画了$V$中所有点$x$包含狄拉克质量$delta_{x}$的函数。这个问题发挥重要作用的其他例子和应用有:(i)离散布朗运动-希尔伯特空间，即Cameron-Martin Hilbert空间的离散版本;(ii)对应于图-拉普拉斯算子的能量-希尔伯特空间，其中顶点集$V$配备一个阻力度量;最后(iii)高斯自由场的研究。

引用次数: 30

Links between multiplicity automata, observable operator models and predictive state representations: a unified learning framework 多重自动机、可观察算子模型和预测状态表示之间的联系:一个统一的学习框架

J. Mach. Learn. Res.

Pub Date : 2015-01-01 DOI: 10.5555/2789272.2789276

Michael R. Thon, H. Jaeger

Stochastic multiplicity automata (SMA) are weighted finite automata that generalize probabilistic automata. They have been used in the context of probabilistic grammatical inference. Observable operator models (OOMs) are a generalization of hidden Markov models, which in turn are models for discrete-valued stochastic processes and are used ubiquitously in the context of speech recognition and bio-sequence modeling. Predictive state representations (PSRs) extend OOMs to stochastic input-output systems and are employed in the context of agent modeling and planning. We present SMA, OOMs, and PSRs under the common framework of sequential systems, which are an algebraic characterization of multiplicity automata, and examine the precise relationships between them. Furthermore, we establish a unified approach to learning such models from data. Many of the learning algorithms that have been proposed can be understood as variations of this basic learning scheme, and several turn out to be closely related to each other, or even equivalent.

随机多重自动机是一种对概率自动机进行广义化的加权有限自动机。它们被用于概率语法推理。可观察算子模型(OOMs)是隐马尔可夫模型的一种推广，隐马尔可夫模型是离散值随机过程的模型，在语音识别和生物序列建模中被广泛使用。预测状态表示(PSRs)将oom扩展到随机输入输出系统，并用于智能体建模和规划。我们在序列系统的共同框架下提出了SMA、oom和psr，它们是多重自动机的代数表征，并研究了它们之间的精确关系。此外，我们建立了从数据中学习这些模型的统一方法。已经提出的许多学习算法都可以被理解为这种基本学习方案的变体，其中一些算法彼此密切相关，甚至是等效的。

引用次数: 39

Preface to this special issue 这期特刊的序言

J. Mach. Learn. Res.

Pub Date : 2015-01-01 DOI: 10.5555/2789272.2886803

A. Gammerman, V. Vovk

This issue of JMLR is devoted to the memory of Alexey Chervonenkis. Over the period of a dozen years between 1962 and 1973 he and Vladimir Vapnik created a new discipline of statistical learning theory—the foundation on which all our modern understanding of pattern recognition is based. Alexey was 28 years old when they made their most famous and original discovery, the uniform law of large numbers. In that short period Vapnik and Chervonenkis also introduced the main concepts of statistical learning theory, such as VCdimension, capacity control, and the Structural Risk Minimization principle, and designed two powerful pattern recognition methods, Generalised Portrait and Optimal Separating Hyperplane, later transformed by Vladimir Vapnik into Support Vector Machine—arguably one of the best tools for pattern recognition and regression estimation. Thereafter Alexey continued to publish original and important contributions to learning theory. He was also active in research in several applied fields, including geology, bioinformatics, medicine, and advertising. Alexey tragically died in September 2014 after getting lost during a hike in the Elk Island park on the outskirts of Moscow. Vladimir Vapnik suggested to prepare an issue of JMLR to be published at the first anniversary of the death of his long-term collaborator and close friend. Vladimir and the editors contacted a few dozen leading researchers in the fields of machine learning related to Alexey’s research interests and had many enthusiastic replies. In the end eleven papers were accepted. This issue also contains a first attempt at a complete bibliography of Alexey Chervonenkis’s publications. Simultaneously with this special issue will appear Alexey’s Festschrift (Vovk et al., 2015), to which the reader is referred for information about Alexey’s research, life, and death. The Festschrift is based in part on a symposium held in Pathos, Cyprus, in 2013 to celebrate Alexey’s 75th anniversary. Apart from research contributions, it contains Alexey’s reminiscences about his early work on statistical learning with Vladimir Vapnik, a reprint of their seminal 1971 paper, a historical chapter by R. M. Dudley, reminiscences of Alexey’s and Vladimir’s close colleague Vasily Novoseltsev, and three reviews of various measures of complexity used in machine learning (“Measures of Complexity” is both the name of the symposium and the title of the book). Among Alexey’s contributions to machine learning (mostly joint with Vladimir Vapnik) discussed in the book are:

本期《JMLR》是为了纪念Alexey Chervonenkis。在1962年到1973年的十几年间，他和Vladimir Vapnik创造了统计学习理论这一新的学科，这是我们所有现代模式识别理解的基础。阿列克谢28岁时，他们做出了最著名、最具原创性的发现，大数统一定律。在这段时间内，Vapnik和Chervonenkis还引入了统计学习理论的主要概念，如vc维、容量控制和结构风险最小化原则，并设计了两种强大的模式识别方法，即广义肖像和最优分离超平面，后来被Vladimir Vapnik转化为支持向量机，这是模式识别和回归估计的最佳工具之一。此后，阿列克谢继续发表对学习理论的原创和重要贡献。他还活跃于多个应用领域的研究，包括地质学、生物信息学、医学和广告。2014年9月，阿列克谢在莫斯科郊区的麋鹿岛公园徒步旅行时迷路，不幸去世。弗拉基米尔·瓦普尼克建议在他的长期合作者和亲密朋友逝世一周年之际出版一期《联合政治与政治研究》。Vladimir和编辑们联系了几十位与Alexey的研究兴趣相关的机器学习领域的顶尖研究人员，并得到了许多热情的回复。最后十一篇论文被接受了。本期还首次尝试对Alexey Chervonenkis的出版物进行完整的参考书目。与此特刊同时将出现Alexey的Festschrift (Vovk等人，2015)，读者可以参考有关Alexey的研究，生活和死亡的信息。2013年，为庆祝阿列克谢75周年，在塞浦路斯的帕索斯举行了一次研讨会。除了研究贡献之外，它还包括阿列克谢对他与弗拉基米尔·瓦普尼克(Vladimir Vapnik)在统计学习方面的早期工作的回忆，这是他们1971年开创性论文的再版，R. M. Dudley的历史章节，阿列克谢和弗拉基米尔的亲密同事瓦西里·诺沃seltsev的回忆，以及对机器学习中使用的各种复杂性度量的三篇评论(“复杂性度量”既是研讨会的名称，也是本书的标题)。书中讨论了Alexey对机器学习的贡献(主要是与Vladimir Vapnik合作):

{"title":"Preface to this special issue","authors":"A. Gammerman, V. Vovk","doi":"10.5555/2789272.2886803","DOIUrl":"https://doi.org/10.5555/2789272.2886803","url":null,"abstract":"This issue of JMLR is devoted to the memory of Alexey Chervonenkis. Over the period of a dozen years between 1962 and 1973 he and Vladimir Vapnik created a new discipline of statistical learning theory—the foundation on which all our modern understanding of pattern recognition is based. Alexey was 28 years old when they made their most famous and original discovery, the uniform law of large numbers. In that short period Vapnik and Chervonenkis also introduced the main concepts of statistical learning theory, such as VCdimension, capacity control, and the Structural Risk Minimization principle, and designed two powerful pattern recognition methods, Generalised Portrait and Optimal Separating Hyperplane, later transformed by Vladimir Vapnik into Support Vector Machine—arguably one of the best tools for pattern recognition and regression estimation. Thereafter Alexey continued to publish original and important contributions to learning theory. He was also active in research in several applied fields, including geology, bioinformatics, medicine, and advertising. Alexey tragically died in September 2014 after getting lost during a hike in the Elk Island park on the outskirts of Moscow. Vladimir Vapnik suggested to prepare an issue of JMLR to be published at the first anniversary of the death of his long-term collaborator and close friend. Vladimir and the editors contacted a few dozen leading researchers in the fields of machine learning related to Alexey’s research interests and had many enthusiastic replies. In the end eleven papers were accepted. This issue also contains a first attempt at a complete bibliography of Alexey Chervonenkis’s publications. Simultaneously with this special issue will appear Alexey’s Festschrift (Vovk et al., 2015), to which the reader is referred for information about Alexey’s research, life, and death. The Festschrift is based in part on a symposium held in Pathos, Cyprus, in 2013 to celebrate Alexey’s 75th anniversary. Apart from research contributions, it contains Alexey’s reminiscences about his early work on statistical learning with Vladimir Vapnik, a reprint of their seminal 1971 paper, a historical chapter by R. M. Dudley, reminiscences of Alexey’s and Vladimir’s close colleague Vasily Novoseltsev, and three reviews of various measures of complexity used in machine learning (“Measures of Complexity” is both the name of the symposium and the title of the book). Among Alexey’s contributions to machine learning (mostly joint with Vladimir Vapnik) discussed in the book are:","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"30 1","pages":"1677-1681"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73946949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Semi-supervised interpolation in an anticausal learning scenario 反因果学习场景中的半监督插值

J. Mach. Learn. Res.

Pub Date : 2015-01-01 DOI: 10.5555/2789272.2886811

D. Janzing, B. Scholkopf

According to a recently stated 'independence postulate', the distribution Pcause contains no information about the conditional Peffect|cause while Peffect may contain information about Pcause|effect. Since semi-supervised learning (SSL) attempts to exploit information from PX to assist in predicting Y from X, it should only work in anticausal direction, i.e., when Y is the cause and X is the effect. In causal direction, when X is the cause and Y the effect, unlabelled x-values should be useless. To shed light on this asymmetry, we study a deterministic causal relation Y = f(X) as recently assayed in Information-Geometric Causal Inference (IGCI). Within this model, we discuss two options to formalize the independence of PX and f as an orthogonality of vectors in appropriate inner product spaces. We prove that unlabelled data help for the problem of interpolating a monotonically increasing function if and only if the orthogonality conditions are violated - which we only expect for the anticausal direction. Here, performance of SSL and its supervised baseline analogue is measured in terms of two different loss functions: first, the mean squared error and second the surprise in a Bayesian prediction scenario.

根据最近提出的一个“独立性假设”，分布Pcause不包含有关条件条件peeffect |cause的信息，而peeffect可能包含有关Pcause|effect的信息。由于半监督学习(SSL)试图利用来自PX的信息来帮助预测来自X的Y，因此它应该只在反因果方向上工作，即当Y是原因而X是结果时。在因果方向上，当X是原因，Y是结果时，未标记的X值应该是无用的。为了阐明这种不对称性，我们研究了最近在信息几何因果推理(IGCI)中分析的确定性因果关系Y = f(X)。在这个模型中，我们讨论了将PX和f的独立性形式化为适当内积空间中向量的正交性的两种选择。我们证明，当且仅当正交性条件被违反时，未标记数据有助于插值单调递增函数的问题-我们只期望在反因果方向上。在这里，SSL及其监督基线模拟的性能是根据两个不同的损失函数来测量的:首先是均方误差，其次是贝叶斯预测场景中的惊讶度。

{"title":"Semi-supervised interpolation in an anticausal learning scenario","authors":"D. Janzing, B. Scholkopf","doi":"10.5555/2789272.2886811","DOIUrl":"https://doi.org/10.5555/2789272.2886811","url":null,"abstract":"According to a recently stated 'independence postulate', the distribution Pcause contains no information about the conditional Peffect|cause while Peffect may contain information about Pcause|effect. Since semi-supervised learning (SSL) attempts to exploit information from PX to assist in predicting Y from X, it should only work in anticausal direction, i.e., when Y is the cause and X is the effect. In causal direction, when X is the cause and Y the effect, unlabelled x-values should be useless. To shed light on this asymmetry, we study a deterministic causal relation Y = f(X) as recently assayed in Information-Geometric Causal Inference (IGCI). Within this model, we discuss two options to formalize the independence of PX and f as an orthogonality of vectors in appropriate inner product spaces. We prove that unlabelled data help for the problem of interpolating a monotonically increasing function if and only if the orthogonality conditions are violated - which we only expect for the anticausal direction. Here, performance of SSL and its supervised baseline analogue is measured in terms of two different loss functions: first, the mean squared error and second the surprise in a Bayesian prediction scenario.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"9 1","pages":"1923-1948"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84605148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

A view of margin losses as regularizers of probability estimates 将保证金损失视为概率估计的正则化器

J. Mach. Learn. Res.

Pub Date : 2015-01-01 DOI: 10.5555/2789272.2912087

Hamed Masnadi-Shirazi, N. Vasconcelos

Regularization is commonly used in classifier design, to assure good generalization. Classical regularization enforces a cost on classifier complexity, by constraining parameters. This is usually combined with a margin loss, which favors large-margin decision rules. A novel and unified view of this architecture is proposed, by showing that margin losses act as regularizers of posterior class probabilities, in a way that amplifies classical parameter regularization. The problem of controlling the regularization strength of a margin loss is considered, using a decomposition of the loss in terms of a link and a binding function. The link function is shown to be responsible for the regularization strength of the loss, while the binding function determines its outlier robustness. A large class of losses is then categorized into equivalence classes of identical regularization strength or outlier robustness. It is shown that losses in the same regularization class can be parameterized so as to have tunable regularization strength. This parameterization is finally used to derive boosting algorithms with loss regularization (BoostLR). Three classes of tunable regularization losses are considered in detail. Canonical losses can implement all regularization behaviors but have no flexibility in terms of outlier modeling. Shrinkage losses support equally parameterized link and binding functions, leading to boosting algorithms that implement the popular shrinkage procedure. This offers a new explanation for shrinkage as a special case of loss-based regularization. Finally, α-tunable losses enable the independent parameterization of link and binding functions, leading to boosting algorithms of great exibility. This is illustrated by the derivation of an algorithm that generalizes both AdaBoost and LogitBoost, behaving as either one when that best suits the data to classify. Various experiments provide evidence of the benefits of probability regularization for both classification and estimation of posterior class probabilities.

正则化通常用于分类器设计，以确保良好的泛化。经典正则化通过约束参数来增加分类器的复杂度。这通常与保证金损失相结合，这有利于大额保证金决策规则。通过表明保证金损失作为后验类概率的正则化器，以一种放大经典参数正则化的方式，提出了这种结构的一种新颖而统一的观点。利用一种基于链接和绑定函数的损失分解方法，研究了控制边际损失正则化强度的问题。链接函数负责损失的正则化强度，而绑定函数决定其离群鲁棒性。然后将大量损失分类为具有相同正则化强度或离群鲁棒性的等价类。结果表明，同一正则化类中的损失可以参数化，从而具有可调的正则化强度。最后利用该参数化方法推导出带有损失正则化的增强算法(BoostLR)。详细讨论了三类可调正则化损失。规范损失可以实现所有的正则化行为，但在离群值建模方面没有灵活性。收缩损失支持同样参数化的链接和绑定函数，从而促进了实现流行收缩过程的算法。这为收缩作为基于损失的正则化的一种特殊情况提供了一种新的解释。最后，α-可调损失使链接函数和绑定函数能够独立参数化，从而使增强算法具有很大的灵活性。这可以通过一种算法的推导来说明，该算法可以推广AdaBoost和LogitBoost，当最适合分类的数据时，它就表现为其中任何一种。各种实验证明了概率正则化对后验类概率的分类和估计的好处。

{"title":"A view of margin losses as regularizers of probability estimates","authors":"Hamed Masnadi-Shirazi, N. Vasconcelos","doi":"10.5555/2789272.2912087","DOIUrl":"https://doi.org/10.5555/2789272.2912087","url":null,"abstract":"Regularization is commonly used in classifier design, to assure good generalization. Classical regularization enforces a cost on classifier complexity, by constraining parameters. This is usually combined with a margin loss, which favors large-margin decision rules. A novel and unified view of this architecture is proposed, by showing that margin losses act as regularizers of posterior class probabilities, in a way that amplifies classical parameter regularization. The problem of controlling the regularization strength of a margin loss is considered, using a decomposition of the loss in terms of a link and a binding function. The link function is shown to be responsible for the regularization strength of the loss, while the binding function determines its outlier robustness. A large class of losses is then categorized into equivalence classes of identical regularization strength or outlier robustness. It is shown that losses in the same regularization class can be parameterized so as to have tunable regularization strength. This parameterization is finally used to derive boosting algorithms with loss regularization (BoostLR). Three classes of tunable regularization losses are considered in detail. Canonical losses can implement all regularization behaviors but have no flexibility in terms of outlier modeling. Shrinkage losses support equally parameterized link and binding functions, leading to boosting algorithms that implement the popular shrinkage procedure. This offers a new explanation for shrinkage as a special case of loss-based regularization. Finally, α-tunable losses enable the independent parameterization of link and binding functions, leading to boosting algorithms of great exibility. This is illustrated by the derivation of an algorithm that generalizes both AdaBoost and LogitBoost, behaving as either one when that best suits the data to classify. Various experiments provide evidence of the benefits of probability regularization for both classification and estimation of posterior class probabilities.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"17 1","pages":"2751-2795"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80322330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

J. Mach. Learn. Res.

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀