Semi-supervised Text Classification Using SVM with Exponential Kernel

International journal of database theory and application Pub Date : 2017-01-31 DOI:10.14257/IJDTA.2017.10.1.08

Liyun Zhong

{"title":"Semi-supervised Text Classification Using SVM with Exponential Kernel","authors":"Liyun Zhong","doi":"10.14257/IJDTA.2017.10.1.08","DOIUrl":null,"url":null,"abstract":"Kernel-based learning methods (kernel methods for short) in general and support vector machine (SVM) in particular have been successfully applied to the task of text classification. This is mainly due to their relatively high classification accuracy on several application domains as well as their ability to handle high dimensional and sparse data which is the prohibitive characteristics of textual data representation. A significant challenge in text classification is to reduce the need for labeled training data while maintaining an acceptable performance. This paper presents a semi-supervised technique using the exponential kernel for text classification. Specifically, the semantic similarities between terms are first determined with both labeled and unlabeled training data by means of a diffusion process on a graph defined by lexicon and co-occurrence information, and the exponential kernel is then constructed based on the learned semantic similarity. Finally, the SVM classifier trains a model for each class during the training phase and this model is then applied to all test examples in the test phase. The main feature of this approach is that it takes advantage of the exponential kernel to reveal the semantic similarities between terms in an unsupervised manner, which provides a kernel framework for semi-supervised learning. The proposed approach is demonstrated on several benchmark data sets for text classification and the experimental results show that it can significantly improve the classification performance.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"49 1","pages":"79-88"},"PeriodicalIF":0.0000,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of database theory and application","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14257/IJDTA.2017.10.1.08","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Kernel-based learning methods (kernel methods for short) in general and support vector machine (SVM) in particular have been successfully applied to the task of text classification. This is mainly due to their relatively high classification accuracy on several application domains as well as their ability to handle high dimensional and sparse data which is the prohibitive characteristics of textual data representation. A significant challenge in text classification is to reduce the need for labeled training data while maintaining an acceptable performance. This paper presents a semi-supervised technique using the exponential kernel for text classification. Specifically, the semantic similarities between terms are first determined with both labeled and unlabeled training data by means of a diffusion process on a graph defined by lexicon and co-occurrence information, and the exponential kernel is then constructed based on the learned semantic similarity. Finally, the SVM classifier trains a model for each class during the training phase and this model is then applied to all test examples in the test phase. The main feature of this approach is that it takes advantage of the exponential kernel to reveal the semantic similarities between terms in an unsupervised manner, which provides a kernel framework for semi-supervised learning. The proposed approach is demonstrated on several benchmark data sets for text classification and the experimental results show that it can significantly improve the classification performance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于指数核的SVM半监督文本分类

基于核的学习方法(简称核方法)，特别是支持向量机(SVM)已经成功地应用于文本分类任务。这主要是由于它们在几个应用领域的分类精度相对较高，以及它们处理高维和稀疏数据的能力，这是文本数据表示的禁忌特征。文本分类的一个重大挑战是在保持可接受的性能的同时减少对标记训练数据的需求。本文提出了一种利用指数核进行文本分类的半监督技术。具体而言，首先在由词汇和共现信息定义的图上通过扩散过程确定标记和未标记训练数据之间的语义相似度，然后基于学习到的语义相似度构造指数核。最后，SVM分类器在训练阶段为每个类别训练一个模型，然后将该模型应用于测试阶段的所有测试样例。该方法的主要特点是利用指数核以无监督的方式揭示术语之间的语义相似性，为半监督学习提供了一个核框架。在多个文本分类基准数据集上进行了验证，实验结果表明该方法能显著提高分类性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International journal of database theory and application

自引率

0.00%

发文量