{"title":"基于长短期记忆的印尼语推文讽刺检测","authors":"Suko Tyas Pernanda, Moh Edi Wibowo, N. Rokhman","doi":"10.1109/ICIC54025.2021.9632886","DOIUrl":null,"url":null,"abstract":"Twitter is a massive source of information that can potentially be used to obtain valuable insights about public opinions, public ideas, and public circumstances. Extracting accurate information from tweets, however, is often challenging due to the use of informal, non-standard, and figurative languages including sarcasm. Sarcasm itself conveys messages using words with opposite literal meaning. Sarcasm detection, therefore, becomes an important task during information extraction from public tweets. This research proposes the use of LSTMs to detect sarcastic tweets in Indonesian language through the extraction of sentence-embedding features. LSTMs have been known to be able to learn sequential patterns in input data so that features extracted by LSTMs are more representative than those manually hand-crafted by human. The proposed LSTMs are combined with the Word2Vec model that serves as a word encoder that preserves semantic meaning. The proposed method is evaluated on tweets that are scrapped from the Web using some keywords related to popular topics. The experimental results demonstrate that the proposed method is able to achieve an accuracy of 82.13% and an f1-score of 61.31% outperforming the conventional TF-IDF + naïve Bayes sarcasm detector. These results thus prove that sentence-embedding is able to extract features that are more accurate and more discriminative for sarcasm detection.","PeriodicalId":189541,"journal":{"name":"2021 Sixth International Conference on Informatics and Computing (ICIC)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sarcasm Detection of Tweets in Indonesian Language Using Long Short-Term Memory\",\"authors\":\"Suko Tyas Pernanda, Moh Edi Wibowo, N. Rokhman\",\"doi\":\"10.1109/ICIC54025.2021.9632886\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Twitter is a massive source of information that can potentially be used to obtain valuable insights about public opinions, public ideas, and public circumstances. Extracting accurate information from tweets, however, is often challenging due to the use of informal, non-standard, and figurative languages including sarcasm. Sarcasm itself conveys messages using words with opposite literal meaning. Sarcasm detection, therefore, becomes an important task during information extraction from public tweets. This research proposes the use of LSTMs to detect sarcastic tweets in Indonesian language through the extraction of sentence-embedding features. LSTMs have been known to be able to learn sequential patterns in input data so that features extracted by LSTMs are more representative than those manually hand-crafted by human. The proposed LSTMs are combined with the Word2Vec model that serves as a word encoder that preserves semantic meaning. The proposed method is evaluated on tweets that are scrapped from the Web using some keywords related to popular topics. The experimental results demonstrate that the proposed method is able to achieve an accuracy of 82.13% and an f1-score of 61.31% outperforming the conventional TF-IDF + naïve Bayes sarcasm detector. These results thus prove that sentence-embedding is able to extract features that are more accurate and more discriminative for sarcasm detection.\",\"PeriodicalId\":189541,\"journal\":{\"name\":\"2021 Sixth International Conference on Informatics and Computing (ICIC)\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Sixth International Conference on Informatics and Computing (ICIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIC54025.2021.9632886\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Sixth International Conference on Informatics and Computing (ICIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIC54025.2021.9632886","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Sarcasm Detection of Tweets in Indonesian Language Using Long Short-Term Memory
Twitter is a massive source of information that can potentially be used to obtain valuable insights about public opinions, public ideas, and public circumstances. Extracting accurate information from tweets, however, is often challenging due to the use of informal, non-standard, and figurative languages including sarcasm. Sarcasm itself conveys messages using words with opposite literal meaning. Sarcasm detection, therefore, becomes an important task during information extraction from public tweets. This research proposes the use of LSTMs to detect sarcastic tweets in Indonesian language through the extraction of sentence-embedding features. LSTMs have been known to be able to learn sequential patterns in input data so that features extracted by LSTMs are more representative than those manually hand-crafted by human. The proposed LSTMs are combined with the Word2Vec model that serves as a word encoder that preserves semantic meaning. The proposed method is evaluated on tweets that are scrapped from the Web using some keywords related to popular topics. The experimental results demonstrate that the proposed method is able to achieve an accuracy of 82.13% and an f1-score of 61.31% outperforming the conventional TF-IDF + naïve Bayes sarcasm detector. These results thus prove that sentence-embedding is able to extract features that are more accurate and more discriminative for sarcasm detection.