{"title":"使用fastText的仇恨言论和辱骂语言分类","authors":"Guntur Budi Herwanto, Annisa Maulida Ningtyas, Kurniawan Eka Nugraha, I. Nyoman Prayana Trisna","doi":"10.1109/ISRITI48646.2019.9034560","DOIUrl":null,"url":null,"abstract":"Hate speeches are defined as utterances, writings, actions, or performances that are intended to incite violence or prejudice against a person on the basis of the characteristics of a particular group that he or she is representing, such as race, ethnicity. In this study, we built a hate speech classification model using word representation with continous bag of words (CBOW) and fastText algorithm. This algorithms was chosen, because it is able to achieve a good performance, specially in the case of rare words by making use of character level information. Based on this result, we can see that there is no single, universal variations that outperform other. But in general, models that use pre-trained vectors from Wiki outperform models that do not use pre-trained vectors.","PeriodicalId":367363,"journal":{"name":"2019 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Hate Speech and Abusive Language Classification using fastText\",\"authors\":\"Guntur Budi Herwanto, Annisa Maulida Ningtyas, Kurniawan Eka Nugraha, I. Nyoman Prayana Trisna\",\"doi\":\"10.1109/ISRITI48646.2019.9034560\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hate speeches are defined as utterances, writings, actions, or performances that are intended to incite violence or prejudice against a person on the basis of the characteristics of a particular group that he or she is representing, such as race, ethnicity. In this study, we built a hate speech classification model using word representation with continous bag of words (CBOW) and fastText algorithm. This algorithms was chosen, because it is able to achieve a good performance, specially in the case of rare words by making use of character level information. Based on this result, we can see that there is no single, universal variations that outperform other. But in general, models that use pre-trained vectors from Wiki outperform models that do not use pre-trained vectors.\",\"PeriodicalId\":367363,\"journal\":{\"name\":\"2019 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISRITI48646.2019.9034560\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISRITI48646.2019.9034560","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hate Speech and Abusive Language Classification using fastText
Hate speeches are defined as utterances, writings, actions, or performances that are intended to incite violence or prejudice against a person on the basis of the characteristics of a particular group that he or she is representing, such as race, ethnicity. In this study, we built a hate speech classification model using word representation with continous bag of words (CBOW) and fastText algorithm. This algorithms was chosen, because it is able to achieve a good performance, specially in the case of rare words by making use of character level information. Based on this result, we can see that there is no single, universal variations that outperform other. But in general, models that use pre-trained vectors from Wiki outperform models that do not use pre-trained vectors.