A new text representation scheme combining Bag-of-Words and Bag-of-Concepts approaches for automatic text classification

2013 7th IEEE GCC Conference and Exhibition (GCC) Pub Date : 2013-11-01 DOI:10.1109/IEEEGCC.2013.6705759

A. Alahmadi, Arash Joorabchi, A. Mahdi

引用次数: 23

Abstract

This paper introduces a new approach to creating text representations and apply it to a standard text classification collections. The approach is based on supplementing the well-known Bag-of-Words (BOW) representational scheme with a concept-based representation that utilises Wikipedia as a knowledge base. The proposed representations are used to generate a Vector Space Model, which in turn is fed into a Support Vector Machine classifier to categorise a collection of textual documents from two publically available datasets. Experimental results for evaluating the performance of our model in comparison to using a standard BOW scheme and a concept-based scheme, as well as recently reported similar text representations that are based on augmenting the standard BOW approach with concept-based representations.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

一种结合词袋和概念袋方法的文本自动分类表示方案

本文介绍了一种创建文本表示的新方法，并将其应用于标准文本分类集合。该方法是基于使用维基百科作为知识库的基于概念的表示来补充众所周知的词袋表示方案(BOW)。所提出的表示用于生成向量空间模型，该模型反过来被馈送到支持向量机分类器中，以对来自两个公开可用数据集的文本文档集合进行分类。与使用标准BOW方案和基于概念的方案相比，评估我们模型性能的实验结果，以及最近报道的基于基于概念的表示增强标准BOW方法的类似文本表示。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2013 7th IEEE GCC Conference and Exhibition (GCC)

自引率

0.00%

发文量