基于词嵌入的微博文本数据挖掘

Danielly Sorato, Renato Fileto
{"title":"基于词嵌入的微博文本数据挖掘","authors":"Danielly Sorato, Renato Fileto","doi":"10.1145/3330204.3330228","DOIUrl":null,"url":null,"abstract":"Microblog posts (e.g. tweets) often contain users opinions and thoughts about events, products, people, organizations, among other possibilities. However, the usage of social media to promote online disinformation and manipulation is not an uncommon occurrence. Analyzing the characteristics of such discourses in social media is essential for understanding and fighting such actions. Extracting recurrent fragments of text, i.e. word sequences, which are semantically similar can lead to the discovery of linguistic patterns used in certain kinds of discourse. Therefore, we aim to use such patterns to encapsulate frequent discourses textually expressed in microblog posts. In this paper, we propose to exploit linguistic patterns in the context of the 2016 United Estates presidential election. Through a technique that we call Short Semantic Pattern (SSP) mining, we were able to extract sequences of words that share a similar meaning in their word embedding representation. In the experiments we investigate the incidence of SSP instances regarding political adversaries and media in tweets posted by Donald Trump, during the presidential election campaign. Experimental results show a high preponderance of some statements of Donald Trump towards their adversaries and expressions that often appeared in such tweets.","PeriodicalId":348938,"journal":{"name":"Proceedings of the XV Brazilian Symposium on Information Systems","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Linguistic Pattern Mining for Data Analysis in Microblog Texts using Word Embeddings\",\"authors\":\"Danielly Sorato, Renato Fileto\",\"doi\":\"10.1145/3330204.3330228\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Microblog posts (e.g. tweets) often contain users opinions and thoughts about events, products, people, organizations, among other possibilities. However, the usage of social media to promote online disinformation and manipulation is not an uncommon occurrence. Analyzing the characteristics of such discourses in social media is essential for understanding and fighting such actions. Extracting recurrent fragments of text, i.e. word sequences, which are semantically similar can lead to the discovery of linguistic patterns used in certain kinds of discourse. Therefore, we aim to use such patterns to encapsulate frequent discourses textually expressed in microblog posts. In this paper, we propose to exploit linguistic patterns in the context of the 2016 United Estates presidential election. Through a technique that we call Short Semantic Pattern (SSP) mining, we were able to extract sequences of words that share a similar meaning in their word embedding representation. In the experiments we investigate the incidence of SSP instances regarding political adversaries and media in tweets posted by Donald Trump, during the presidential election campaign. Experimental results show a high preponderance of some statements of Donald Trump towards their adversaries and expressions that often appeared in such tweets.\",\"PeriodicalId\":348938,\"journal\":{\"name\":\"Proceedings of the XV Brazilian Symposium on Information Systems\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the XV Brazilian Symposium on Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3330204.3330228\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the XV Brazilian Symposium on Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3330204.3330228","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

微博帖子(如tweets)通常包含用户对事件、产品、人物、组织等的看法和想法。然而,利用社交媒体促进在线虚假信息和操纵并不罕见。分析社交媒体中此类话语的特征对于理解和打击此类行为至关重要。提取文本中重复出现的片段,即语义相似的词序列,可以发现某些类型话语中使用的语言模式。因此,我们的目标是用这种模式来封装在微博中频繁表达的语篇。在本文中,我们建议在2016年美国总统选举的背景下利用语言模式。通过一种我们称为短语义模式(SSP)挖掘的技术,我们能够提取在单词嵌入表示中具有相似含义的单词序列。在实验中,我们调查了唐纳德·特朗普在总统竞选期间发布的推文中关于政治对手和媒体的SSP实例的发生率。实验结果显示,唐纳德·特朗普对对手的一些言论和经常出现在这类推文中的表达具有很高的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Linguistic Pattern Mining for Data Analysis in Microblog Texts using Word Embeddings
Microblog posts (e.g. tweets) often contain users opinions and thoughts about events, products, people, organizations, among other possibilities. However, the usage of social media to promote online disinformation and manipulation is not an uncommon occurrence. Analyzing the characteristics of such discourses in social media is essential for understanding and fighting such actions. Extracting recurrent fragments of text, i.e. word sequences, which are semantically similar can lead to the discovery of linguistic patterns used in certain kinds of discourse. Therefore, we aim to use such patterns to encapsulate frequent discourses textually expressed in microblog posts. In this paper, we propose to exploit linguistic patterns in the context of the 2016 United Estates presidential election. Through a technique that we call Short Semantic Pattern (SSP) mining, we were able to extract sequences of words that share a similar meaning in their word embedding representation. In the experiments we investigate the incidence of SSP instances regarding political adversaries and media in tweets posted by Donald Trump, during the presidential election campaign. Experimental results show a high preponderance of some statements of Donald Trump towards their adversaries and expressions that often appeared in such tweets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Outer-Tuning: an integration of rules, ontology and RDBMS Market Prediction in Criptocurrency: A Systematic Literature Mapping Machine learning techniques for code smells detection: an empirical experiment on a highly imbalanced setup Kairós LifeReview: A model for monitoring people with anxiety disorder
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1