Luca Bacco, Andrea Cimino, L. Paulon, M. Merone, F. Dell’Orletta
{"title":"A Machine Learning approach for Sentiment Analysis for Italian Reviews in Healthcare","authors":"Luca Bacco, Andrea Cimino, L. Paulon, M. Merone, F. Dell’Orletta","doi":"10.4000/books.aaccademia.8225","DOIUrl":null,"url":null,"abstract":"In this paper, we present our approach to the task of binary sentiment classification for Italian reviews in healthcare domain. We first collected a new dataset for such domain. Then, we compared the results obtained by two different systems, one including a Support Vector Machine and one with BERT. For the first one, we linguistic pre–processed the dataset to extract hand-crafted features exploited by the classifier. For the second one, we oversampled the dataset to achieve better results. Our results show that the SVMbased system, without the worry of having to oversample, has better performance than the BERT-based one, achieving an F1-score of 91.21%.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"121 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4000/books.aaccademia.8225","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
In this paper, we present our approach to the task of binary sentiment classification for Italian reviews in healthcare domain. We first collected a new dataset for such domain. Then, we compared the results obtained by two different systems, one including a Support Vector Machine and one with BERT. For the first one, we linguistic pre–processed the dataset to extract hand-crafted features exploited by the classifier. For the second one, we oversampled the dataset to achieve better results. Our results show that the SVMbased system, without the worry of having to oversample, has better performance than the BERT-based one, achieving an F1-score of 91.21%.