T. C. Loures, Pedro O. S. Vaz de Melo, Adriano Veloso
{"title":"Generating Entity Representation from Online Discussions: Challenges and an Evaluation Framework","authors":"T. C. Loures, Pedro O. S. Vaz de Melo, Adriano Veloso","doi":"10.1145/3126858.3126882","DOIUrl":null,"url":null,"abstract":"Because of the ubiquitous use of the Internet in current society, it is easy to find groups or communities of people discussing about the most varied subjects. Learning about these subjects (or entities) from such discussions is of great interest for companies, organizations, public figures (e.g. politicians) and researchers alike. In this paper, we explore the problem of learning entity representations using online discussions about them as the only source of information. While such discussions may reveal relevant and surprising information about the corresponding subjects, they may also be completely irrelevant. As another challenge, while regular text documents usually contain a well structured language, online discussions often contain informal and mispelled words. Here we formally define the problem, propose a new benchmark for evaluating vector representation methods, and perform a deep evaluation of well-known techniques using three proposed evaluation scenarios: (i) clustering, (ii) ordering and (iii) recommendation. Results show that each method is better than at least one other in some evaluation.","PeriodicalId":338362,"journal":{"name":"Proceedings of the 23rd Brazillian Symposium on Multimedia and the Web","volume":"91 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 23rd Brazillian Symposium on Multimedia and the Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3126858.3126882","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Because of the ubiquitous use of the Internet in current society, it is easy to find groups or communities of people discussing about the most varied subjects. Learning about these subjects (or entities) from such discussions is of great interest for companies, organizations, public figures (e.g. politicians) and researchers alike. In this paper, we explore the problem of learning entity representations using online discussions about them as the only source of information. While such discussions may reveal relevant and surprising information about the corresponding subjects, they may also be completely irrelevant. As another challenge, while regular text documents usually contain a well structured language, online discussions often contain informal and mispelled words. Here we formally define the problem, propose a new benchmark for evaluating vector representation methods, and perform a deep evaluation of well-known techniques using three proposed evaluation scenarios: (i) clustering, (ii) ordering and (iii) recommendation. Results show that each method is better than at least one other in some evaluation.