基于词嵌入的微博文本数据挖掘

Proceedings of the XV Brazilian Symposium on Information Systems Pub Date : 2019-05-20 DOI:10.1145/3330204.3330228

Danielly Sorato, Renato Fileto

{"title":"基于词嵌入的微博文本数据挖掘","authors":"Danielly Sorato, Renato Fileto","doi":"10.1145/3330204.3330228","DOIUrl":null,"url":null,"abstract":"Microblog posts (e.g. tweets) often contain users opinions and thoughts about events, products, people, organizations, among other possibilities. However, the usage of social media to promote online disinformation and manipulation is not an uncommon occurrence. Analyzing the characteristics of such discourses in social media is essential for understanding and fighting such actions. Extracting recurrent fragments of text, i.e. word sequences, which are semantically similar can lead to the discovery of linguistic patterns used in certain kinds of discourse. Therefore, we aim to use such patterns to encapsulate frequent discourses textually expressed in microblog posts. In this paper, we propose to exploit linguistic patterns in the context of the 2016 United Estates presidential election. Through a technique that we call Short Semantic Pattern (SSP) mining, we were able to extract sequences of words that share a similar meaning in their word embedding representation. In the experiments we investigate the incidence of SSP instances regarding political adversaries and media in tweets posted by Donald Trump, during the presidential election campaign. Experimental results show a high preponderance of some statements of Donald Trump towards their adversaries and expressions that often appeared in such tweets.","PeriodicalId":348938,"journal":{"name":"Proceedings of the XV Brazilian Symposium on Information Systems","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Linguistic Pattern Mining for Data Analysis in Microblog Texts using Word Embeddings\",\"authors\":\"Danielly Sorato, Renato Fileto\",\"doi\":\"10.1145/3330204.3330228\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Microblog posts (e.g. tweets) often contain users opinions and thoughts about events, products, people, organizations, among other possibilities. However, the usage of social media to promote online disinformation and manipulation is not an uncommon occurrence. Analyzing the characteristics of such discourses in social media is essential for understanding and fighting such actions. Extracting recurrent fragments of text, i.e. word sequences, which are semantically similar can lead to the discovery of linguistic patterns used in certain kinds of discourse. Therefore, we aim to use such patterns to encapsulate frequent discourses textually expressed in microblog posts. In this paper, we propose to exploit linguistic patterns in the context of the 2016 United Estates presidential election. Through a technique that we call Short Semantic Pattern (SSP) mining, we were able to extract sequences of words that share a similar meaning in their word embedding representation. In the experiments we investigate the incidence of SSP instances regarding political adversaries and media in tweets posted by Donald Trump, during the presidential election campaign. Experimental results show a high preponderance of some statements of Donald Trump towards their adversaries and expressions that often appeared in such tweets.\",\"PeriodicalId\":348938,\"journal\":{\"name\":\"Proceedings of the XV Brazilian Symposium on Information Systems\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the XV Brazilian Symposium on Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3330204.3330228\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the XV Brazilian Symposium on Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3330204.3330228","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

微博帖子(如tweets)通常包含用户对事件、产品、人物、组织等的看法和想法。然而，利用社交媒体促进在线虚假信息和操纵并不罕见。分析社交媒体中此类话语的特征对于理解和打击此类行为至关重要。提取文本中重复出现的片段，即语义相似的词序列，可以发现某些类型话语中使用的语言模式。因此，我们的目标是用这种模式来封装在微博中频繁表达的语篇。在本文中，我们建议在2016年美国总统选举的背景下利用语言模式。通过一种我们称为短语义模式(SSP)挖掘的技术，我们能够提取在单词嵌入表示中具有相似含义的单词序列。在实验中，我们调查了唐纳德·特朗普在总统竞选期间发布的推文中关于政治对手和媒体的SSP实例的发生率。实验结果显示，唐纳德·特朗普对对手的一些言论和经常出现在这类推文中的表达具有很高的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Linguistic Pattern Mining for Data Analysis in Microblog Texts using Word Embeddings

Microblog posts (e.g. tweets) often contain users opinions and thoughts about events, products, people, organizations, among other possibilities. However, the usage of social media to promote online disinformation and manipulation is not an uncommon occurrence. Analyzing the characteristics of such discourses in social media is essential for understanding and fighting such actions. Extracting recurrent fragments of text, i.e. word sequences, which are semantically similar can lead to the discovery of linguistic patterns used in certain kinds of discourse. Therefore, we aim to use such patterns to encapsulate frequent discourses textually expressed in microblog posts. In this paper, we propose to exploit linguistic patterns in the context of the 2016 United Estates presidential election. Through a technique that we call Short Semantic Pattern (SSP) mining, we were able to extract sequences of words that share a similar meaning in their word embedding representation. In the experiments we investigate the incidence of SSP instances regarding political adversaries and media in tweets posted by Donald Trump, during the presidential election campaign. Experimental results show a high preponderance of some statements of Donald Trump towards their adversaries and expressions that often appeared in such tweets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the XV Brazilian Symposium on Information Systems

自引率

0.00%

发文量

期刊最新文献

Outer-Tuning: an integration of rules, ontology and RDBMS Market Prediction in Criptocurrency: A Systematic Literature Mapping Machine learning techniques for code smells detection: an empirical experiment on a highly imbalanced setup Kairós LifeReview: A model for monitoring people with anxiety disorder