社交媒体文本预处理技术对基于机器学习的分类器改进的有效性

L. Esnaola, Juan Pablo Tessore, Hugo Ramón, C. Russo
{"title":"社交媒体文本预处理技术对基于机器学习的分类器改进的有效性","authors":"L. Esnaola, Juan Pablo Tessore, Hugo Ramón, C. Russo","doi":"10.1109/CLEI47609.2019.235076","DOIUrl":null,"url":null,"abstract":"The language present in the context of social networks is usually more informal than the one used in traditional sources. The researches that take such content as input for machine learning based classifying algorithms, perform, as a first step, a cleaning and standardization process. The goal of the latter is to improve the accuracy of the classification. In this paper, several cleaning tasks are defined and executed over a dataset of comments extracted from the social network Facebook. The goal is to verify if the corrections, made by such tasks, produce a significant improvement in the accuracy reached by the classifying algorithms. The results obtained, indicate that, over this type of dataset, preprocessing tasks with a reasonably good performance in the correction of errors, do not necessarily produce a noteworthy improvement in the classification accuracy reached by the algorithms.","PeriodicalId":216193,"journal":{"name":"2019 XLV Latin American Computing Conference (CLEI)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Effectiveness of preprocessing techniques over social media texts for the improvement of machine learning based classifiers\",\"authors\":\"L. Esnaola, Juan Pablo Tessore, Hugo Ramón, C. Russo\",\"doi\":\"10.1109/CLEI47609.2019.235076\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The language present in the context of social networks is usually more informal than the one used in traditional sources. The researches that take such content as input for machine learning based classifying algorithms, perform, as a first step, a cleaning and standardization process. The goal of the latter is to improve the accuracy of the classification. In this paper, several cleaning tasks are defined and executed over a dataset of comments extracted from the social network Facebook. The goal is to verify if the corrections, made by such tasks, produce a significant improvement in the accuracy reached by the classifying algorithms. The results obtained, indicate that, over this type of dataset, preprocessing tasks with a reasonably good performance in the correction of errors, do not necessarily produce a noteworthy improvement in the classification accuracy reached by the algorithms.\",\"PeriodicalId\":216193,\"journal\":{\"name\":\"2019 XLV Latin American Computing Conference (CLEI)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 XLV Latin American Computing Conference (CLEI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLEI47609.2019.235076\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 XLV Latin American Computing Conference (CLEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLEI47609.2019.235076","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

社交网络中使用的语言通常比传统资源中使用的语言更不正式。将这些内容作为基于机器学习的分类算法的输入的研究,作为第一步,执行一个清理和标准化过程。后者的目标是提高分类的准确性。在本文中,定义了几个清理任务,并在从社交网络Facebook提取的评论数据集上执行。目标是验证这些任务所做的修正是否能显著提高分类算法所达到的准确性。得到的结果表明,在这种类型的数据集上,具有相当好的纠错性能的预处理任务并不一定会使算法达到的分类精度有明显的提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Effectiveness of preprocessing techniques over social media texts for the improvement of machine learning based classifiers
The language present in the context of social networks is usually more informal than the one used in traditional sources. The researches that take such content as input for machine learning based classifying algorithms, perform, as a first step, a cleaning and standardization process. The goal of the latter is to improve the accuracy of the classification. In this paper, several cleaning tasks are defined and executed over a dataset of comments extracted from the social network Facebook. The goal is to verify if the corrections, made by such tasks, produce a significant improvement in the accuracy reached by the classifying algorithms. The results obtained, indicate that, over this type of dataset, preprocessing tasks with a reasonably good performance in the correction of errors, do not necessarily produce a noteworthy improvement in the classification accuracy reached by the algorithms.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Model for Detecting Conflicts and Dependencies in Non-Functional Requirements Using Scenarios and Use Cases Fusion of infrared and visible images using multiscale morphology Pentest on Internet of Things Devices Development of Emotional Intelligence in Computing Students: The “Experiencia 360°” Project Structuring a Folksonomy in a Community of Questions and Answers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1