A New Term Frequency with Gaussian Technique for Text Classification and Sentiment Analysis

IF 0.5 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of ICT Research and Applications Pub Date : 2021-10-07 DOI:10.5614/itbj.ict.res.appl.2021.15.2.4
Vuttichai Vichianchai, Sumonta Kasemvilas
{"title":"A New Term Frequency with Gaussian Technique for Text Classification and Sentiment Analysis","authors":"Vuttichai Vichianchai, Sumonta Kasemvilas","doi":"10.5614/itbj.ict.res.appl.2021.15.2.4","DOIUrl":null,"url":null,"abstract":"This paper proposes a new term frequency with a Gaussian technique (TF-G) to classify the risk of suicide from Thai clinical notes and to perform sentiment analysis based on Thai customer reviews and English tweets of travelers that use US airline services. This research compared TF-G with term weighting techniques based on Thai text classification methods from previous researches, including the bag-of-words (BoW), term frequency (TF), term frequency-inverse document frequency (TF-IDF), and term frequency-inverse corpus document frequency (TF-ICF) techniques. Suicide risk classification and sentiment analysis were performed with the decision tree (DT), naïve Bayes (NB), support vector machine (SVM), random forest (RF), and multilayer perceptron (MLP) techniques. The experimental results showed that TF-G is appropriate for feature extraction to classify the risk of suicide and to analyze the sentiments of customer reviews and tweets of travelers. The TF-G technique was more accurate than BoW, TF, TF-IDF and TF-ICF for term weighting in Thai suicide risk classification, for term weighting in sentiment analysis of Thai customer reviews for Burger King, Pizza Hut, and Sizzler restaurants, and for the sentiment analysis of English tweets of travelers using US airline services.","PeriodicalId":42785,"journal":{"name":"Journal of ICT Research and Applications","volume":" ","pages":""},"PeriodicalIF":0.5000,"publicationDate":"2021-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of ICT Research and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5614/itbj.ict.res.appl.2021.15.2.4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 2

Abstract

This paper proposes a new term frequency with a Gaussian technique (TF-G) to classify the risk of suicide from Thai clinical notes and to perform sentiment analysis based on Thai customer reviews and English tweets of travelers that use US airline services. This research compared TF-G with term weighting techniques based on Thai text classification methods from previous researches, including the bag-of-words (BoW), term frequency (TF), term frequency-inverse document frequency (TF-IDF), and term frequency-inverse corpus document frequency (TF-ICF) techniques. Suicide risk classification and sentiment analysis were performed with the decision tree (DT), naïve Bayes (NB), support vector machine (SVM), random forest (RF), and multilayer perceptron (MLP) techniques. The experimental results showed that TF-G is appropriate for feature extraction to classify the risk of suicide and to analyze the sentiments of customer reviews and tweets of travelers. The TF-G technique was more accurate than BoW, TF, TF-IDF and TF-ICF for term weighting in Thai suicide risk classification, for term weighting in sentiment analysis of Thai customer reviews for Burger King, Pizza Hut, and Sizzler restaurants, and for the sentiment analysis of English tweets of travelers using US airline services.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于高斯技术的文本分类和情感分析新词频
本文提出了一种新的术语频率高斯技术(TF-G),用于从泰国临床笔记中对自杀风险进行分类,并基于泰国客户评论和使用美国航空公司服务的旅行者的英语推文进行情绪分析。本研究将TF-G与基于先前研究的泰语文本分类方法的术语加权技术进行了比较,包括词袋(BoW)、术语频率(TF)、术语-频率逆文档频率(TF-IDF)和术语-频率反语料库文档频率(TF-ICF)技术。采用决策树(DT)、朴素贝叶斯(NB)、支持向量机(SVM)、随机森林(RF)和多层感知器(MLP)技术进行自杀风险分类和情绪分析。实验结果表明,TF-G适合于特征提取,用于对自杀风险进行分类,并分析旅行者的客户评论和推文情绪。TF-G技术在泰国自杀风险分类中的术语权重,在汉堡王、必胜客和Sizzler餐厅的泰国顾客评论情绪分析中的术语加权,以及在使用美国航空公司服务的旅行者的英语推文情绪分析中,都比BoW、TF、TF-IDF和TF-ICF更准确。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of ICT Research and Applications
Journal of ICT Research and Applications COMPUTER SCIENCE, INFORMATION SYSTEMS-
CiteScore
1.60
自引率
0.00%
发文量
13
审稿时长
24 weeks
期刊介绍: Journal of ICT Research and Applications welcomes full research articles in the area of Information and Communication Technology from the following subject areas: Information Theory, Signal Processing, Electronics, Computer Network, Telecommunication, Wireless & Mobile Computing, Internet Technology, Multimedia, Software Engineering, Computer Science, Information System and Knowledge Management. Authors are invited to submit articles that have not been published previously and are not under consideration elsewhere.
期刊最新文献
Smart Card-based Access Control System using Isolated Many-to-Many Authentication Scheme for Electric Vehicle Charging Stations The Evaluation of DyHATR Performance for Dynamic Heterogeneous Graphs Machine Learning-based Early Detection and Prognosis of the Covid-19 Pandemic Improving Robustness Using MixUp and CutMix Augmentation for Corn Leaf Diseases Classification based on ConvMixer Architecture Generative Adversarial Networks Based Scene Generation on Indian Driving Dataset
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1