Generating Entity Representation from Online Discussions: Challenges and an Evaluation Framework

Proceedings of the 23rd Brazillian Symposium on Multimedia and the Web Pub Date : 2017-10-17 DOI:10.1145/3126858.3126882

T. C. Loures, Pedro O. S. Vaz de Melo, Adriano Veloso

{"title":"Generating Entity Representation from Online Discussions: Challenges and an Evaluation Framework","authors":"T. C. Loures, Pedro O. S. Vaz de Melo, Adriano Veloso","doi":"10.1145/3126858.3126882","DOIUrl":null,"url":null,"abstract":"Because of the ubiquitous use of the Internet in current society, it is easy to find groups or communities of people discussing about the most varied subjects. Learning about these subjects (or entities) from such discussions is of great interest for companies, organizations, public figures (e.g. politicians) and researchers alike. In this paper, we explore the problem of learning entity representations using online discussions about them as the only source of information. While such discussions may reveal relevant and surprising information about the corresponding subjects, they may also be completely irrelevant. As another challenge, while regular text documents usually contain a well structured language, online discussions often contain informal and mispelled words. Here we formally define the problem, propose a new benchmark for evaluating vector representation methods, and perform a deep evaluation of well-known techniques using three proposed evaluation scenarios: (i) clustering, (ii) ordering and (iii) recommendation. Results show that each method is better than at least one other in some evaluation.","PeriodicalId":338362,"journal":{"name":"Proceedings of the 23rd Brazillian Symposium on Multimedia and the Web","volume":"91 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 23rd Brazillian Symposium on Multimedia and the Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3126858.3126882","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Because of the ubiquitous use of the Internet in current society, it is easy to find groups or communities of people discussing about the most varied subjects. Learning about these subjects (or entities) from such discussions is of great interest for companies, organizations, public figures (e.g. politicians) and researchers alike. In this paper, we explore the problem of learning entity representations using online discussions about them as the only source of information. While such discussions may reveal relevant and surprising information about the corresponding subjects, they may also be completely irrelevant. As another challenge, while regular text documents usually contain a well structured language, online discussions often contain informal and mispelled words. Here we formally define the problem, propose a new benchmark for evaluating vector representation methods, and perform a deep evaluation of well-known techniques using three proposed evaluation scenarios: (i) clustering, (ii) ordering and (iii) recommendation. Results show that each method is better than at least one other in some evaluation.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

从在线讨论中生成实体表示:挑战和评估框架

由于互联网在当今社会的普遍使用，很容易找到人们讨论各种各样的主题的团体或社区。从这些讨论中了解这些主题(或实体)对公司、组织、公众人物(如政治家)和研究人员都很有兴趣。在本文中，我们探讨了使用在线讨论作为唯一信息源来学习实体表示的问题。虽然这样的讨论可能会揭示出有关相应主题的相关和令人惊讶的信息，但它们也可能完全无关紧要。另一个挑战是，虽然常规文本文档通常包含结构良好的语言，但在线讨论通常包含非正式和拼写错误的单词。在这里，我们正式定义了这个问题，提出了一个评估向量表示方法的新基准，并使用三种提出的评估场景(i)聚类，(ii)排序和(iii)推荐)对知名技术进行了深入评估。结果表明，每种方法在某些评价中至少优于另一种方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 23rd Brazillian Symposium on Multimedia and the Web

自引率

0.00%

发文量