基于长短期记忆的印尼语推文讽刺检测

Suko Tyas Pernanda, Moh Edi Wibowo, N. Rokhman
{"title":"基于长短期记忆的印尼语推文讽刺检测","authors":"Suko Tyas Pernanda, Moh Edi Wibowo, N. Rokhman","doi":"10.1109/ICIC54025.2021.9632886","DOIUrl":null,"url":null,"abstract":"Twitter is a massive source of information that can potentially be used to obtain valuable insights about public opinions, public ideas, and public circumstances. Extracting accurate information from tweets, however, is often challenging due to the use of informal, non-standard, and figurative languages including sarcasm. Sarcasm itself conveys messages using words with opposite literal meaning. Sarcasm detection, therefore, becomes an important task during information extraction from public tweets. This research proposes the use of LSTMs to detect sarcastic tweets in Indonesian language through the extraction of sentence-embedding features. LSTMs have been known to be able to learn sequential patterns in input data so that features extracted by LSTMs are more representative than those manually hand-crafted by human. The proposed LSTMs are combined with the Word2Vec model that serves as a word encoder that preserves semantic meaning. The proposed method is evaluated on tweets that are scrapped from the Web using some keywords related to popular topics. The experimental results demonstrate that the proposed method is able to achieve an accuracy of 82.13% and an f1-score of 61.31% outperforming the conventional TF-IDF + naïve Bayes sarcasm detector. These results thus prove that sentence-embedding is able to extract features that are more accurate and more discriminative for sarcasm detection.","PeriodicalId":189541,"journal":{"name":"2021 Sixth International Conference on Informatics and Computing (ICIC)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sarcasm Detection of Tweets in Indonesian Language Using Long Short-Term Memory\",\"authors\":\"Suko Tyas Pernanda, Moh Edi Wibowo, N. Rokhman\",\"doi\":\"10.1109/ICIC54025.2021.9632886\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Twitter is a massive source of information that can potentially be used to obtain valuable insights about public opinions, public ideas, and public circumstances. Extracting accurate information from tweets, however, is often challenging due to the use of informal, non-standard, and figurative languages including sarcasm. Sarcasm itself conveys messages using words with opposite literal meaning. Sarcasm detection, therefore, becomes an important task during information extraction from public tweets. This research proposes the use of LSTMs to detect sarcastic tweets in Indonesian language through the extraction of sentence-embedding features. LSTMs have been known to be able to learn sequential patterns in input data so that features extracted by LSTMs are more representative than those manually hand-crafted by human. The proposed LSTMs are combined with the Word2Vec model that serves as a word encoder that preserves semantic meaning. The proposed method is evaluated on tweets that are scrapped from the Web using some keywords related to popular topics. The experimental results demonstrate that the proposed method is able to achieve an accuracy of 82.13% and an f1-score of 61.31% outperforming the conventional TF-IDF + naïve Bayes sarcasm detector. These results thus prove that sentence-embedding is able to extract features that are more accurate and more discriminative for sarcasm detection.\",\"PeriodicalId\":189541,\"journal\":{\"name\":\"2021 Sixth International Conference on Informatics and Computing (ICIC)\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Sixth International Conference on Informatics and Computing (ICIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIC54025.2021.9632886\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Sixth International Conference on Informatics and Computing (ICIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIC54025.2021.9632886","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

Twitter是一个巨大的信息来源,可以用来获得关于公众意见、公众想法和公共环境的有价值的见解。然而,由于使用非正式的、非标准的和比喻性的语言,包括讽刺,从tweet中提取准确的信息通常是具有挑战性的。讽刺本身用与字面意思相反的词来传达信息。因此,讽刺检测就成为公众推文信息提取中的一项重要任务。本研究提出使用lstm通过提取句子嵌入特征来检测印尼语的讽刺推文。已知lstm能够学习输入数据中的顺序模式,因此lstm提取的特征比人工手工制作的特征更具代表性。提出的lstm与Word2Vec模型相结合,Word2Vec模型作为保留语义的单词编码器。该方法使用与热门话题相关的关键字对从网络上废弃的tweet进行评估。实验结果表明,该方法的准确率为82.13%,f1分数为61.31%,优于传统的TF-IDF + naïve贝叶斯讽刺检测器。这些结果证明了句子嵌入能够提取出更准确、更有辨别力的特征用于讽刺检测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Sarcasm Detection of Tweets in Indonesian Language Using Long Short-Term Memory
Twitter is a massive source of information that can potentially be used to obtain valuable insights about public opinions, public ideas, and public circumstances. Extracting accurate information from tweets, however, is often challenging due to the use of informal, non-standard, and figurative languages including sarcasm. Sarcasm itself conveys messages using words with opposite literal meaning. Sarcasm detection, therefore, becomes an important task during information extraction from public tweets. This research proposes the use of LSTMs to detect sarcastic tweets in Indonesian language through the extraction of sentence-embedding features. LSTMs have been known to be able to learn sequential patterns in input data so that features extracted by LSTMs are more representative than those manually hand-crafted by human. The proposed LSTMs are combined with the Word2Vec model that serves as a word encoder that preserves semantic meaning. The proposed method is evaluated on tweets that are scrapped from the Web using some keywords related to popular topics. The experimental results demonstrate that the proposed method is able to achieve an accuracy of 82.13% and an f1-score of 61.31% outperforming the conventional TF-IDF + naïve Bayes sarcasm detector. These results thus prove that sentence-embedding is able to extract features that are more accurate and more discriminative for sarcasm detection.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Analysis of IoT adoption on Trucking Logistics in Various Industry in Indonesia Design of Blockchain Implementation for Supervision of Vaccine Distribution: Indonesia Case [ICIC 2021 Back Cover] Design and Simulation of Antipodal Vivaldi Antenna (AVA) AT 2.6 GHz For 5G Communication Optimation Classification of Chili Leaf Disease Using the Gray Level Co-occurrence Matrix (GLCM) and the Support Vector Machine (SVM) Methods
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1