Text mining based on tax comments as big data analysis using SVM and feature selection

Mihuandayani, Ema Utami, E. T. Luthfi
{"title":"Text mining based on tax comments as big data analysis using SVM and feature selection","authors":"Mihuandayani, Ema Utami, E. T. Luthfi","doi":"10.1109/ICOIACT.2018.8350743","DOIUrl":null,"url":null,"abstract":"The tax gives an important role for the contributions of the economy and development of a country. The improvements to the taxation service system continuously done in order to increase the State Budget. One of consideration to know the performance of taxation particularly in Indonesia is to know the public opinion as for the object service. Text mining can be used to know public opinion about the tax system. The rapid growth of data in social media initiates this research to use the data source as big data analysis. The dataset used is derived from Facebook and Twitter as a source of data in processing tax comments. The results of opinions in the form of public sentiment in part of service, website system, and news can be used as consideration to improve the quality of tax services. In this research, text mining is done through the phases of text processing, feature selection and classification with Support Vector Machine (SVM). To reduce the problem of the number of attributes on the dataset in classifying text, Feature Selection used the Information Gain to select the relevant terms to the tax topic. Testing is used to measure the performance level of SVM with Feature Selection from two data sources. Performance measured using the parameters of precision, recall, and F-measure.","PeriodicalId":6660,"journal":{"name":"2018 International Conference on Information and Communications Technology (ICOIACT)","volume":"25 1","pages":"537-542"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Information and Communications Technology (ICOIACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOIACT.2018.8350743","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

The tax gives an important role for the contributions of the economy and development of a country. The improvements to the taxation service system continuously done in order to increase the State Budget. One of consideration to know the performance of taxation particularly in Indonesia is to know the public opinion as for the object service. Text mining can be used to know public opinion about the tax system. The rapid growth of data in social media initiates this research to use the data source as big data analysis. The dataset used is derived from Facebook and Twitter as a source of data in processing tax comments. The results of opinions in the form of public sentiment in part of service, website system, and news can be used as consideration to improve the quality of tax services. In this research, text mining is done through the phases of text processing, feature selection and classification with Support Vector Machine (SVM). To reduce the problem of the number of attributes on the dataset in classifying text, Feature Selection used the Information Gain to select the relevant terms to the tax topic. Testing is used to measure the performance level of SVM with Feature Selection from two data sources. Performance measured using the parameters of precision, recall, and F-measure.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于支持向量机和特征选择的税务评论文本挖掘大数据分析
税收对于一个国家的经济和发展的贡献起着重要作用。不断完善税务服务体系,增加国家预算。要了解税收的执行情况,特别是在印度尼西亚,需要考虑的一个因素是了解公众对客体服务的看法。文本挖掘可以用来了解公众对税收制度的看法。社交媒体中数据的快速增长促使本研究将数据来源作为大数据分析。所使用的数据集来自Facebook和Twitter,作为处理税务评论的数据来源。部分服务、网站系统、新闻等方面的舆情形式的意见结果,可以作为提高税务服务质量的考虑因素。在本研究中,文本挖掘通过文本处理、特征选择和支持向量机(SVM)分类三个阶段完成。为了减少文本分类中数据集属性数量的问题,Feature Selection使用信息增益来选择与税务主题相关的术语。采用测试的方法,通过两个数据源的特征选择来衡量支持向量机的性能水平。使用精度、召回率和F-measure参数测量性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Data Normalization and Database Design for Joglosemar Tourism Management of fault tolerance and traffic congestion in cloud data center Development of smart public transportation system in Jakarta city based on integrated IoT platform Improving the quality of enterprise IT goals using COBIT 5 prioritization approach Data mining technique with cluster anaysis use K-means algorithm for LQ45 index on Indonesia stock exchange
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1