Improving K Nearest Neighbor into String Vector Version for Text Categorization

T. Jo
{"title":"Improving K Nearest Neighbor into String Vector Version for Text Categorization","authors":"T. Jo","doi":"10.23919/ICACT.2019.8702043","DOIUrl":null,"url":null,"abstract":"This research is concerned with the string vector based version of the KNN which is the approach to the text categorization. Traditionally, texts have been encoded into numerical vectors for using the traditional version of KNN, and encoding so leads to the three main problems: huge dimensionality, sparse distribution, and poor transparency. In order to solve the problems, this research propose that texts should be encoded into string vectors the similarity measure between string vectors is defined, and the KNN is modified into the version where string vector is given its input. The proposed KNN version is validated empirically by comparing it with the traditional KNN version on the three collections: NewsPage.com, Opiniopsis, and 20NewsGroups. The goal of this research is to improve the text categorization performance by solving them.","PeriodicalId":226261,"journal":{"name":"2019 21st International Conference on Advanced Communication Technology (ICACT)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 21st International Conference on Advanced Communication Technology (ICACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/ICACT.2019.8702043","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

This research is concerned with the string vector based version of the KNN which is the approach to the text categorization. Traditionally, texts have been encoded into numerical vectors for using the traditional version of KNN, and encoding so leads to the three main problems: huge dimensionality, sparse distribution, and poor transparency. In order to solve the problems, this research propose that texts should be encoded into string vectors the similarity measure between string vectors is defined, and the KNN is modified into the version where string vector is given its input. The proposed KNN version is validated empirically by comparing it with the traditional KNN version on the three collections: NewsPage.com, Opiniopsis, and 20NewsGroups. The goal of this research is to improve the text categorization performance by solving them.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
改进K近邻为字符串向量版本的文本分类
本文研究了基于字符串向量的KNN算法,这是一种文本分类方法。传统上,为了使用传统版本的KNN,文本被编码成数值向量,这样编码会导致三个主要问题:巨大的维度、稀疏的分布和较差的透明度。为了解决这一问题,本研究提出将文本编码为字符串向量,定义字符串向量之间的相似性度量,并将KNN修改为给定字符串向量输入的版本。通过将所提出的KNN版本与传统的KNN版本在三个集合(NewsPage.com、Opiniopsis和20NewsGroups)上进行比较,对所提出的KNN版本进行了实证验证。本研究的目的是通过解决这些问题来提高文本分类的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Novel Ranging Code based on improved Logistic Map Chaotic Sequences A Learning Kit on IPv6 Deployment and its Security Challenges for Neophytes Cybercrime Countermeasure of Insider Threat Investigation A Novel Ultra-Wideband Antenna Operating in the frequency band of 2.5-40GHz Modelling Chlorophyll-a Concentration using Deep Neural Networks considering Extreme Data Imbalance and Skewness
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1