Kernel based Learning Suitable for Text Categorization

T. Jo, Malrey Lee
{"title":"Kernel based Learning Suitable for Text Categorization","authors":"T. Jo, Malrey Lee","doi":"10.1109/SERA.2007.97","DOIUrl":null,"url":null,"abstract":"This research proposes a new strategy where documents are encoded into string vectors for text categorization and modified versions of SVM to be adaptable to string vectors. Traditionally, when the supervised machine learning algorithms are used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in text categorization, encoding full texts given as raw data into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. In this research, we encode full texts into string vectors, and apply the SVM to string vectors for text categorization.","PeriodicalId":181543,"journal":{"name":"5th ACIS International Conference on Software Engineering Research, Management & Applications (SERA 2007)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"5th ACIS International Conference on Software Engineering Research, Management & Applications (SERA 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SERA.2007.97","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This research proposes a new strategy where documents are encoded into string vectors for text categorization and modified versions of SVM to be adaptable to string vectors. Traditionally, when the supervised machine learning algorithms are used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in text categorization, encoding full texts given as raw data into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. In this research, we encode full texts into string vectors, and apply the SVM to string vectors for text categorization.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
适合于文本分类的基于核的学习
本研究提出了一种新的策略,将文档编码成字符串向量进行文本分类,并修改支持向量机的版本以适应字符串向量。传统上,当使用监督机器学习算法进行模式分类时,应将原始数据编码为数值向量。这种编码可能比较困难,这取决于模式分类的给定应用领域。例如,在文本分类中,将全文作为原始数据编码为数值向量会导致两个主要问题:巨大的维度和稀疏的分布。在本研究中,我们将全文编码为字符串向量,并将支持向量机应用于字符串向量进行文本分类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Keynote Address 1 Keynote Address 2 A Preliminary Formal Specification of Virtual Organization Creation with RAISE Specification Language Efficient User Support with DACS Scheme A Framework for the Use of Six Sigma Tools in PSP/TSP
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1