从在线讨论中生成实体表示:挑战和评估框架

T. C. Loures, Pedro O. S. Vaz de Melo, Adriano Veloso
{"title":"从在线讨论中生成实体表示:挑战和评估框架","authors":"T. C. Loures, Pedro O. S. Vaz de Melo, Adriano Veloso","doi":"10.1145/3126858.3126882","DOIUrl":null,"url":null,"abstract":"Because of the ubiquitous use of the Internet in current society, it is easy to find groups or communities of people discussing about the most varied subjects. Learning about these subjects (or entities) from such discussions is of great interest for companies, organizations, public figures (e.g. politicians) and researchers alike. In this paper, we explore the problem of learning entity representations using online discussions about them as the only source of information. While such discussions may reveal relevant and surprising information about the corresponding subjects, they may also be completely irrelevant. As another challenge, while regular text documents usually contain a well structured language, online discussions often contain informal and mispelled words. Here we formally define the problem, propose a new benchmark for evaluating vector representation methods, and perform a deep evaluation of well-known techniques using three proposed evaluation scenarios: (i) clustering, (ii) ordering and (iii) recommendation. Results show that each method is better than at least one other in some evaluation.","PeriodicalId":338362,"journal":{"name":"Proceedings of the 23rd Brazillian Symposium on Multimedia and the Web","volume":"91 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Generating Entity Representation from Online Discussions: Challenges and an Evaluation Framework\",\"authors\":\"T. C. Loures, Pedro O. S. Vaz de Melo, Adriano Veloso\",\"doi\":\"10.1145/3126858.3126882\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Because of the ubiquitous use of the Internet in current society, it is easy to find groups or communities of people discussing about the most varied subjects. Learning about these subjects (or entities) from such discussions is of great interest for companies, organizations, public figures (e.g. politicians) and researchers alike. In this paper, we explore the problem of learning entity representations using online discussions about them as the only source of information. While such discussions may reveal relevant and surprising information about the corresponding subjects, they may also be completely irrelevant. As another challenge, while regular text documents usually contain a well structured language, online discussions often contain informal and mispelled words. Here we formally define the problem, propose a new benchmark for evaluating vector representation methods, and perform a deep evaluation of well-known techniques using three proposed evaluation scenarios: (i) clustering, (ii) ordering and (iii) recommendation. Results show that each method is better than at least one other in some evaluation.\",\"PeriodicalId\":338362,\"journal\":{\"name\":\"Proceedings of the 23rd Brazillian Symposium on Multimedia and the Web\",\"volume\":\"91 3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 23rd Brazillian Symposium on Multimedia and the Web\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3126858.3126882\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 23rd Brazillian Symposium on Multimedia and the Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3126858.3126882","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

由于互联网在当今社会的普遍使用,很容易找到人们讨论各种各样的主题的团体或社区。从这些讨论中了解这些主题(或实体)对公司、组织、公众人物(如政治家)和研究人员都很有兴趣。在本文中,我们探讨了使用在线讨论作为唯一信息源来学习实体表示的问题。虽然这样的讨论可能会揭示出有关相应主题的相关和令人惊讶的信息,但它们也可能完全无关紧要。另一个挑战是,虽然常规文本文档通常包含结构良好的语言,但在线讨论通常包含非正式和拼写错误的单词。在这里,我们正式定义了这个问题,提出了一个评估向量表示方法的新基准,并使用三种提出的评估场景(i)聚类,(ii)排序和(iii)推荐)对知名技术进行了深入评估。结果表明,每种方法在某些评价中至少优于另一种方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Generating Entity Representation from Online Discussions: Challenges and an Evaluation Framework
Because of the ubiquitous use of the Internet in current society, it is easy to find groups or communities of people discussing about the most varied subjects. Learning about these subjects (or entities) from such discussions is of great interest for companies, organizations, public figures (e.g. politicians) and researchers alike. In this paper, we explore the problem of learning entity representations using online discussions about them as the only source of information. While such discussions may reveal relevant and surprising information about the corresponding subjects, they may also be completely irrelevant. As another challenge, while regular text documents usually contain a well structured language, online discussions often contain informal and mispelled words. Here we formally define the problem, propose a new benchmark for evaluating vector representation methods, and perform a deep evaluation of well-known techniques using three proposed evaluation scenarios: (i) clustering, (ii) ordering and (iii) recommendation. Results show that each method is better than at least one other in some evaluation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
STorM: A Hypermedia Authoring Model for Interactive Digital Out-of-Home Media Distributed Data Clustering in the Context of the Internet of Things: A Data Traffic Reduction Approach AnyLanguage-To-LIBRAS: Evaluation of an Machine Translation Service of Any Oralized Language for the Brazilian Sign Language Adaptive Sensing Relevance Exploiting Social Media Mining in Smart Cities Automatic Text Recognition in Web Images
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1