L. Esnaola, Juan Pablo Tessore, Hugo Ramón, C. Russo
{"title":"社交媒体文本预处理技术对基于机器学习的分类器改进的有效性","authors":"L. Esnaola, Juan Pablo Tessore, Hugo Ramón, C. Russo","doi":"10.1109/CLEI47609.2019.235076","DOIUrl":null,"url":null,"abstract":"The language present in the context of social networks is usually more informal than the one used in traditional sources. The researches that take such content as input for machine learning based classifying algorithms, perform, as a first step, a cleaning and standardization process. The goal of the latter is to improve the accuracy of the classification. In this paper, several cleaning tasks are defined and executed over a dataset of comments extracted from the social network Facebook. The goal is to verify if the corrections, made by such tasks, produce a significant improvement in the accuracy reached by the classifying algorithms. The results obtained, indicate that, over this type of dataset, preprocessing tasks with a reasonably good performance in the correction of errors, do not necessarily produce a noteworthy improvement in the classification accuracy reached by the algorithms.","PeriodicalId":216193,"journal":{"name":"2019 XLV Latin American Computing Conference (CLEI)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Effectiveness of preprocessing techniques over social media texts for the improvement of machine learning based classifiers\",\"authors\":\"L. Esnaola, Juan Pablo Tessore, Hugo Ramón, C. Russo\",\"doi\":\"10.1109/CLEI47609.2019.235076\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The language present in the context of social networks is usually more informal than the one used in traditional sources. The researches that take such content as input for machine learning based classifying algorithms, perform, as a first step, a cleaning and standardization process. The goal of the latter is to improve the accuracy of the classification. In this paper, several cleaning tasks are defined and executed over a dataset of comments extracted from the social network Facebook. The goal is to verify if the corrections, made by such tasks, produce a significant improvement in the accuracy reached by the classifying algorithms. The results obtained, indicate that, over this type of dataset, preprocessing tasks with a reasonably good performance in the correction of errors, do not necessarily produce a noteworthy improvement in the classification accuracy reached by the algorithms.\",\"PeriodicalId\":216193,\"journal\":{\"name\":\"2019 XLV Latin American Computing Conference (CLEI)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 XLV Latin American Computing Conference (CLEI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLEI47609.2019.235076\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 XLV Latin American Computing Conference (CLEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLEI47609.2019.235076","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Effectiveness of preprocessing techniques over social media texts for the improvement of machine learning based classifiers
The language present in the context of social networks is usually more informal than the one used in traditional sources. The researches that take such content as input for machine learning based classifying algorithms, perform, as a first step, a cleaning and standardization process. The goal of the latter is to improve the accuracy of the classification. In this paper, several cleaning tasks are defined and executed over a dataset of comments extracted from the social network Facebook. The goal is to verify if the corrections, made by such tasks, produce a significant improvement in the accuracy reached by the classifying algorithms. The results obtained, indicate that, over this type of dataset, preprocessing tasks with a reasonably good performance in the correction of errors, do not necessarily produce a noteworthy improvement in the classification accuracy reached by the algorithms.