{"title":"Identifying Abusive Comments in Hebrew Facebook","authors":"Chaya Liebeskind, Shmuel Liebeskind","doi":"10.1109/ICSEE.2018.8646190","DOIUrl":null,"url":null,"abstract":"In this study, we aim to classify comments as abusive or non-abusive. We develop a Hebrew corpus of user comments annotated for abusive language. Then, we investigate highly sparse n-grams representations as well as denser character n-grams representations for comment abuse classification. Since the comments in social media are usually short, we also investigate four dimension reduction methods, which produce word vectors that collapse similar words into groups. We show that the character n-grams representations outperform all the other representation for the task of identifying abusive comments.","PeriodicalId":254455,"journal":{"name":"2018 IEEE International Conference on the Science of Electrical Engineering in Israel (ICSEE)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on the Science of Electrical Engineering in Israel (ICSEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSEE.2018.8646190","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
In this study, we aim to classify comments as abusive or non-abusive. We develop a Hebrew corpus of user comments annotated for abusive language. Then, we investigate highly sparse n-grams representations as well as denser character n-grams representations for comment abuse classification. Since the comments in social media are usually short, we also investigate four dimension reduction methods, which produce word vectors that collapse similar words into groups. We show that the character n-grams representations outperform all the other representation for the task of identifying abusive comments.