基于长短期记忆的印尼语推文讽刺检测

2021 Sixth International Conference on Informatics and Computing (ICIC) Pub Date : 2021-11-03 DOI:10.1109/ICIC54025.2021.9632886

Suko Tyas Pernanda, Moh Edi Wibowo, N. Rokhman

{"title":"基于长短期记忆的印尼语推文讽刺检测","authors":"Suko Tyas Pernanda, Moh Edi Wibowo, N. Rokhman","doi":"10.1109/ICIC54025.2021.9632886","DOIUrl":null,"url":null,"abstract":"Twitter is a massive source of information that can potentially be used to obtain valuable insights about public opinions, public ideas, and public circumstances. Extracting accurate information from tweets, however, is often challenging due to the use of informal, non-standard, and figurative languages including sarcasm. Sarcasm itself conveys messages using words with opposite literal meaning. Sarcasm detection, therefore, becomes an important task during information extraction from public tweets. This research proposes the use of LSTMs to detect sarcastic tweets in Indonesian language through the extraction of sentence-embedding features. LSTMs have been known to be able to learn sequential patterns in input data so that features extracted by LSTMs are more representative than those manually hand-crafted by human. The proposed LSTMs are combined with the Word2Vec model that serves as a word encoder that preserves semantic meaning. The proposed method is evaluated on tweets that are scrapped from the Web using some keywords related to popular topics. The experimental results demonstrate that the proposed method is able to achieve an accuracy of 82.13% and an f1-score of 61.31% outperforming the conventional TF-IDF + naïve Bayes sarcasm detector. These results thus prove that sentence-embedding is able to extract features that are more accurate and more discriminative for sarcasm detection.","PeriodicalId":189541,"journal":{"name":"2021 Sixth International Conference on Informatics and Computing (ICIC)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sarcasm Detection of Tweets in Indonesian Language Using Long Short-Term Memory\",\"authors\":\"Suko Tyas Pernanda, Moh Edi Wibowo, N. Rokhman\",\"doi\":\"10.1109/ICIC54025.2021.9632886\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Twitter is a massive source of information that can potentially be used to obtain valuable insights about public opinions, public ideas, and public circumstances. Extracting accurate information from tweets, however, is often challenging due to the use of informal, non-standard, and figurative languages including sarcasm. Sarcasm itself conveys messages using words with opposite literal meaning. Sarcasm detection, therefore, becomes an important task during information extraction from public tweets. This research proposes the use of LSTMs to detect sarcastic tweets in Indonesian language through the extraction of sentence-embedding features. LSTMs have been known to be able to learn sequential patterns in input data so that features extracted by LSTMs are more representative than those manually hand-crafted by human. The proposed LSTMs are combined with the Word2Vec model that serves as a word encoder that preserves semantic meaning. The proposed method is evaluated on tweets that are scrapped from the Web using some keywords related to popular topics. The experimental results demonstrate that the proposed method is able to achieve an accuracy of 82.13% and an f1-score of 61.31% outperforming the conventional TF-IDF + naïve Bayes sarcasm detector. These results thus prove that sentence-embedding is able to extract features that are more accurate and more discriminative for sarcasm detection.\",\"PeriodicalId\":189541,\"journal\":{\"name\":\"2021 Sixth International Conference on Informatics and Computing (ICIC)\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Sixth International Conference on Informatics and Computing (ICIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIC54025.2021.9632886\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Sixth International Conference on Informatics and Computing (ICIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIC54025.2021.9632886","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

Twitter是一个巨大的信息来源，可以用来获得关于公众意见、公众想法和公共环境的有价值的见解。然而，由于使用非正式的、非标准的和比喻性的语言，包括讽刺，从tweet中提取准确的信息通常是具有挑战性的。讽刺本身用与字面意思相反的词来传达信息。因此，讽刺检测就成为公众推文信息提取中的一项重要任务。本研究提出使用lstm通过提取句子嵌入特征来检测印尼语的讽刺推文。已知lstm能够学习输入数据中的顺序模式，因此lstm提取的特征比人工手工制作的特征更具代表性。提出的lstm与Word2Vec模型相结合，Word2Vec模型作为保留语义的单词编码器。该方法使用与热门话题相关的关键字对从网络上废弃的tweet进行评估。实验结果表明，该方法的准确率为82.13%，f1分数为61.31%，优于传统的TF-IDF + naïve贝叶斯讽刺检测器。这些结果证明了句子嵌入能够提取出更准确、更有辨别力的特征用于讽刺检测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Sarcasm Detection of Tweets in Indonesian Language Using Long Short-Term Memory

Twitter is a massive source of information that can potentially be used to obtain valuable insights about public opinions, public ideas, and public circumstances. Extracting accurate information from tweets, however, is often challenging due to the use of informal, non-standard, and figurative languages including sarcasm. Sarcasm itself conveys messages using words with opposite literal meaning. Sarcasm detection, therefore, becomes an important task during information extraction from public tweets. This research proposes the use of LSTMs to detect sarcastic tweets in Indonesian language through the extraction of sentence-embedding features. LSTMs have been known to be able to learn sequential patterns in input data so that features extracted by LSTMs are more representative than those manually hand-crafted by human. The proposed LSTMs are combined with the Word2Vec model that serves as a word encoder that preserves semantic meaning. The proposed method is evaluated on tweets that are scrapped from the Web using some keywords related to popular topics. The experimental results demonstrate that the proposed method is able to achieve an accuracy of 82.13% and an f1-score of 61.31% outperforming the conventional TF-IDF + naïve Bayes sarcasm detector. These results thus prove that sentence-embedding is able to extract features that are more accurate and more discriminative for sarcasm detection.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 Sixth International Conference on Informatics and Computing (ICIC)

自引率

0.00%

发文量