An unsupervised hierarchical approach to document categorization

R. Wetzker, T. Alpcan, C. Bauckhage, Winfried Umbrath, S. Albayrak
{"title":"An unsupervised hierarchical approach to document categorization","authors":"R. Wetzker, T. Alpcan, C. Bauckhage, Winfried Umbrath, S. Albayrak","doi":"10.1109/WI.2007.21","DOIUrl":null,"url":null,"abstract":"We propose a hierarchical approach to document categorization that requires no pre-configuration and maps the semantic document space to a predefined taxonomy. The utilization of search engines to train a hierarchical classifier makes our approach more flexible than existing solutions which rely on (human) labeled data and are bound to a specific domain. We show that the structural information given by the taxonomy allows for a context aware construction of search queries and leads to higher tagging accuracy. We test our approach on different benchmark datasets and evaluate its performance on the single- and multi-tag assignment tasks. The experimental results show that our solution is as accurate as supervised classifiers for web page classification and still performs well when categorizing domain specific documents.","PeriodicalId":192501,"journal":{"name":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'07)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'07)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI.2007.21","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

Abstract

We propose a hierarchical approach to document categorization that requires no pre-configuration and maps the semantic document space to a predefined taxonomy. The utilization of search engines to train a hierarchical classifier makes our approach more flexible than existing solutions which rely on (human) labeled data and are bound to a specific domain. We show that the structural information given by the taxonomy allows for a context aware construction of search queries and leads to higher tagging accuracy. We test our approach on different benchmark datasets and evaluate its performance on the single- and multi-tag assignment tasks. The experimental results show that our solution is as accurate as supervised classifiers for web page classification and still performs well when categorizing domain specific documents.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
文档分类的无监督分层方法
我们提出了一种分层的文档分类方法,它不需要预先配置,并将语义文档空间映射到预定义的分类法。利用搜索引擎来训练层次分类器使我们的方法比现有的依赖于(人类)标记数据并绑定到特定领域的解决方案更加灵活。我们展示了分类法给出的结构信息允许搜索查询的上下文感知构造,并导致更高的标记准确性。我们在不同的基准数据集上测试了我们的方法,并评估了它在单标签和多标签分配任务上的性能。实验结果表明,该方法在网页分类上的准确率与监督分类器相当,在对特定领域的文档进行分类时仍然表现良好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
On the Conceptual Tagging: An Ontology Pruning Use Case Extending Description Logic for Reasoning about Ontology Evolution You Can't Always Get What You Want: Achieving Differentiated Service Levels with Pricing Agents in a Storage Grid An unsupervised hierarchical approach to document categorization How Up-to-date should it be? the Value of Instant Profiling and Adaptation in Information Filtering
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1