Experimental analysis of new algorithms for learning ternary classifiers

Jean-Daniel Zucker, Y. Chevaleyre, Dao Van Sang
{"title":"Experimental analysis of new algorithms for learning ternary classifiers","authors":"Jean-Daniel Zucker, Y. Chevaleyre, Dao Van Sang","doi":"10.1109/RIVF.2015.7049868","DOIUrl":null,"url":null,"abstract":"Discrete linear classifier is a very sparse class of decision model that has proved useful to reduce overfitting in very high dimension learning problems. However, learning discrete linear classifier is known as a difficult problem. It requires finding a discrete linear model minimizing the classification error over a given sample. A ternary classifier is a classifier defined by a pair (w, r) where w is a vector in {-1, 0, +1}n and r is a nonnegative real capturing the threshold or offset. The goal of the learning algorithm is to find a vector of weights in {-1, 0, +1}n that minimizes the hinge loss of the linear model from the training data. This problem is NP-hard and one approach consists in exactly solving the relaxed continuous problem and to heuristically derive discrete solutions. A recent paper by the authors has introduced a randomized rounding algorithm [1] and we propose in this paper more sophisticated algorithms that improve the generalization error. These algorithms are presented and their performances are experimentally analyzed. Our results show that this kind of compact model can address the complex problem of learning predictors from bioinformatics data such as metagenomics ones where the size of samples is much smaller than the number of attributes. The new algorithms presented improve the state of the art algorithm to learn ternary classifier. The source of power of this improvement is done at the expense of time complexity.","PeriodicalId":166971,"journal":{"name":"The 2015 IEEE RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2015 IEEE RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RIVF.2015.7049868","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Discrete linear classifier is a very sparse class of decision model that has proved useful to reduce overfitting in very high dimension learning problems. However, learning discrete linear classifier is known as a difficult problem. It requires finding a discrete linear model minimizing the classification error over a given sample. A ternary classifier is a classifier defined by a pair (w, r) where w is a vector in {-1, 0, +1}n and r is a nonnegative real capturing the threshold or offset. The goal of the learning algorithm is to find a vector of weights in {-1, 0, +1}n that minimizes the hinge loss of the linear model from the training data. This problem is NP-hard and one approach consists in exactly solving the relaxed continuous problem and to heuristically derive discrete solutions. A recent paper by the authors has introduced a randomized rounding algorithm [1] and we propose in this paper more sophisticated algorithms that improve the generalization error. These algorithms are presented and their performances are experimentally analyzed. Our results show that this kind of compact model can address the complex problem of learning predictors from bioinformatics data such as metagenomics ones where the size of samples is much smaller than the number of attributes. The new algorithms presented improve the state of the art algorithm to learn ternary classifier. The source of power of this improvement is done at the expense of time complexity.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
三元分类器学习新算法的实验分析
离散线性分类器是一种非常稀疏的决策模型,在非常高维的学习问题中被证明对减少过拟合非常有用。然而,学习离散线性分类器是一个难题。它需要找到一个离散的线性模型,使给定样本的分类误差最小化。三元分类器是由一对(w, r)定义的分类器,其中w是{- 1,0,+1}n中的向量,r是捕获阈值或偏移量的非负实数。学习算法的目标是找到一个权值为{- 1,0,+1}n的向量,使线性模型的铰链损失从训练数据中最小化。这个问题是np困难的,一种方法是精确地求解松弛连续问题并启发式地推导离散解。在最近的一篇论文中,作者引入了一种随机舍入算法[1],我们在本文中提出了更复杂的算法来改善泛化误差。给出了这些算法,并对其性能进行了实验分析。我们的研究结果表明,这种紧凑的模型可以解决从生物信息学数据(如宏基因组学数据)中学习预测因子的复杂问题,其中样本的大小远远小于属性的数量。提出的新算法改进了当前学习三元分类器的算法。这种改进的动力来源是以牺牲时间复杂性为代价的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Robust and high capacity watermarking for image based on DWT-SVD SentiVoice - a system for querying hotel service reviews via phone On the design of energy efficient environment monitoring station and data collection network based on ubiquitous wireless sensor networks Identifying semantic and syntactic relations from text documents Quantitative evaluation of facial paralysis using tracking method
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1