Multi-concept Document Classification Using a Perceptron-Like Algorithm

Clay Woolam, L. Khan
{"title":"Multi-concept Document Classification Using a Perceptron-Like Algorithm","authors":"Clay Woolam, L. Khan","doi":"10.1109/WIIAT.2008.410","DOIUrl":null,"url":null,"abstract":"Previous work in hierarchical categorization focuses on the hierarchical perceptron (Hieron) algorithm. Hierarchical perceptron works on the principles of the perceptron,that is each class label in the hierarchy has an associated weight vector. To account for the hierarchy, we begin at the root of the tree and sum all weights to the target label.We make a prediction by considering the label that yields the maximum inner product of its feature set with its path-summed weights. Learning is done by adjusting the weights along the path from the predicted node to the correct node by a specific loss function that adheres to the large margin principal. There are several problems with applying this approach to a multiple class problem. In many cases we could end up punishing weights that gave a correct prediction, because the algorithm can only take a single case at a time. In this paper we present an extended hierarchical perceptron algorithm capable of solving the multiple categorization problem (MultiHieron). We introduce new aggregate loss function for multiple label learning. We make weight updates simultaneously instead of serially. Then, significant improvement over the basic Hieron algorithm is demonstrated on the aviation safety reporting system (ASRS) flight anomaly database and OntoNews corpus using both flat and hierarchical categorization metrics.","PeriodicalId":393772,"journal":{"name":"2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2008-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WIIAT.2008.410","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Previous work in hierarchical categorization focuses on the hierarchical perceptron (Hieron) algorithm. Hierarchical perceptron works on the principles of the perceptron,that is each class label in the hierarchy has an associated weight vector. To account for the hierarchy, we begin at the root of the tree and sum all weights to the target label.We make a prediction by considering the label that yields the maximum inner product of its feature set with its path-summed weights. Learning is done by adjusting the weights along the path from the predicted node to the correct node by a specific loss function that adheres to the large margin principal. There are several problems with applying this approach to a multiple class problem. In many cases we could end up punishing weights that gave a correct prediction, because the algorithm can only take a single case at a time. In this paper we present an extended hierarchical perceptron algorithm capable of solving the multiple categorization problem (MultiHieron). We introduce new aggregate loss function for multiple label learning. We make weight updates simultaneously instead of serially. Then, significant improvement over the basic Hieron algorithm is demonstrated on the aviation safety reporting system (ASRS) flight anomaly database and OntoNews corpus using both flat and hierarchical categorization metrics.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于感知器算法的多概念文档分类
以前的分层分类研究主要集中在分层感知器(Hieron)算法上。层次感知器的工作原理是感知器,即层次中的每个类标签都有一个相关的权重向量。为了解释层次结构,我们从树的根开始,并将所有权重求和到目标标签。我们通过考虑产生其特征集与其路径和权值的最大内积的标签来进行预测。学习是通过一个特定的损失函数来调整从预测节点到正确节点的路径上的权重来完成的,这个损失函数遵循大边际原则。将这种方法应用于多类问题有几个问题。在许多情况下,我们最终可能会惩罚给出正确预测的权重,因为算法一次只能处理一个情况。本文提出了一种能够解决多重分类问题的扩展层次感知器算法(MultiHieron)。我们引入了新的聚合损失函数用于多标签学习。我们同时更新权重,而不是连续更新。然后,在航空安全报告系统(ASRS)飞行异常数据库和ontonnews语料库上,使用扁平和分层分类指标,证明了基于基本Hieron算法的显著改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Effective Usage of Computational Trust Models in Rational Environments Link-Based Anomaly Detection in Communication Networks Quality Information Retrieval for the World Wide Web A k-Nearest-Neighbour Method for Classifying Web Search Results with Data in Folksonomies Concept Extraction and Clustering for Topic Digital Library Construction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1