Experimental analysis of new algorithms for learning ternary classifiers

The 2015 IEEE RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF) Pub Date : 2015-02-26 DOI:10.1109/RIVF.2015.7049868

Jean-Daniel Zucker, Y. Chevaleyre, Dao Van Sang

{"title":"Experimental analysis of new algorithms for learning ternary classifiers","authors":"Jean-Daniel Zucker, Y. Chevaleyre, Dao Van Sang","doi":"10.1109/RIVF.2015.7049868","DOIUrl":null,"url":null,"abstract":"Discrete linear classifier is a very sparse class of decision model that has proved useful to reduce overfitting in very high dimension learning problems. However, learning discrete linear classifier is known as a difficult problem. It requires finding a discrete linear model minimizing the classification error over a given sample. A ternary classifier is a classifier defined by a pair (w, r) where w is a vector in {-1, 0, +1}n and r is a nonnegative real capturing the threshold or offset. The goal of the learning algorithm is to find a vector of weights in {-1, 0, +1}n that minimizes the hinge loss of the linear model from the training data. This problem is NP-hard and one approach consists in exactly solving the relaxed continuous problem and to heuristically derive discrete solutions. A recent paper by the authors has introduced a randomized rounding algorithm [1] and we propose in this paper more sophisticated algorithms that improve the generalization error. These algorithms are presented and their performances are experimentally analyzed. Our results show that this kind of compact model can address the complex problem of learning predictors from bioinformatics data such as metagenomics ones where the size of samples is much smaller than the number of attributes. The new algorithms presented improve the state of the art algorithm to learn ternary classifier. The source of power of this improvement is done at the expense of time complexity.","PeriodicalId":166971,"journal":{"name":"The 2015 IEEE RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2015 IEEE RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RIVF.2015.7049868","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Discrete linear classifier is a very sparse class of decision model that has proved useful to reduce overfitting in very high dimension learning problems. However, learning discrete linear classifier is known as a difficult problem. It requires finding a discrete linear model minimizing the classification error over a given sample. A ternary classifier is a classifier defined by a pair (w, r) where w is a vector in {-1, 0, +1}n and r is a nonnegative real capturing the threshold or offset. The goal of the learning algorithm is to find a vector of weights in {-1, 0, +1}n that minimizes the hinge loss of the linear model from the training data. This problem is NP-hard and one approach consists in exactly solving the relaxed continuous problem and to heuristically derive discrete solutions. A recent paper by the authors has introduced a randomized rounding algorithm [1] and we propose in this paper more sophisticated algorithms that improve the generalization error. These algorithms are presented and their performances are experimentally analyzed. Our results show that this kind of compact model can address the complex problem of learning predictors from bioinformatics data such as metagenomics ones where the size of samples is much smaller than the number of attributes. The new algorithms presented improve the state of the art algorithm to learn ternary classifier. The source of power of this improvement is done at the expense of time complexity.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

三元分类器学习新算法的实验分析

离散线性分类器是一种非常稀疏的决策模型，在非常高维的学习问题中被证明对减少过拟合非常有用。然而，学习离散线性分类器是一个难题。它需要找到一个离散的线性模型，使给定样本的分类误差最小化。三元分类器是由一对(w, r)定义的分类器，其中w是{- 1,0，+1}n中的向量，r是捕获阈值或偏移量的非负实数。学习算法的目标是找到一个权值为{- 1,0，+1}n的向量，使线性模型的铰链损失从训练数据中最小化。这个问题是np困难的，一种方法是精确地求解松弛连续问题并启发式地推导离散解。在最近的一篇论文中，作者引入了一种随机舍入算法[1]，我们在本文中提出了更复杂的算法来改善泛化误差。给出了这些算法，并对其性能进行了实验分析。我们的研究结果表明，这种紧凑的模型可以解决从生物信息学数据(如宏基因组学数据)中学习预测因子的复杂问题，其中样本的大小远远小于属性的数量。提出的新算法改进了当前学习三元分类器的算法。这种改进的动力来源是以牺牲时间复杂性为代价的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

The 2015 IEEE RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)

自引率

0.00%

发文量