Discovering Entities with Just a Little Help from You

Jaspreet Singh, Johannes Hoffart, Avishek Anand
{"title":"Discovering Entities with Just a Little Help from You","authors":"Jaspreet Singh, Johannes Hoffart, Avishek Anand","doi":"10.1145/2983323.2983798","DOIUrl":null,"url":null,"abstract":"Linking entities like people, organizations, books, music groups and their songs in text to knowledge bases (KBs) is a fundamental task for many downstream search and mining applications. Achieving high disambiguation accuracy crucially depends on a rich and holistic representation of the entities in the KB. For popular entities, such a representation can be easily mined from Wikipedia, and many current entity disambiguation and linking methods make use of this fact. However, Wikipedia does not contain long-tail entities that only few people are interested in, and also at times lags behind until newly emerging entities are added. For such entities, mining a suitable representation in a fully automated fashion is very difficult, resulting in poor linking accuracy. What can automatically be mined, though, is a high-quality representation given the context of a new entity occurring in any text. Due to the lack of knowledge about the entity, no method can retrieve these occurrences automatically with high precision, resulting in a chicken-egg problem. To address this, our approach automatically generates candidate occurrences of entities, prompting the user for feedback to decide if the occurrence refers to the actual entity in question. This feedback gradually improves the knowledge and allows our methods to provide better candidate suggestions to keep the user engaged. We propose novel human-in-the-loop retrieval methods for generating candidates based on gradient interleaving of diversification and textual relevance approaches. We conducted extensive experiments on the FACC dataset, showing that our approaches convincingly outperform carefully selected baselines in both intrinsic and extrinsic measures while keeping users engaged.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"103 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2983323.2983798","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

Abstract

Linking entities like people, organizations, books, music groups and their songs in text to knowledge bases (KBs) is a fundamental task for many downstream search and mining applications. Achieving high disambiguation accuracy crucially depends on a rich and holistic representation of the entities in the KB. For popular entities, such a representation can be easily mined from Wikipedia, and many current entity disambiguation and linking methods make use of this fact. However, Wikipedia does not contain long-tail entities that only few people are interested in, and also at times lags behind until newly emerging entities are added. For such entities, mining a suitable representation in a fully automated fashion is very difficult, resulting in poor linking accuracy. What can automatically be mined, though, is a high-quality representation given the context of a new entity occurring in any text. Due to the lack of knowledge about the entity, no method can retrieve these occurrences automatically with high precision, resulting in a chicken-egg problem. To address this, our approach automatically generates candidate occurrences of entities, prompting the user for feedback to decide if the occurrence refers to the actual entity in question. This feedback gradually improves the knowledge and allows our methods to provide better candidate suggestions to keep the user engaged. We propose novel human-in-the-loop retrieval methods for generating candidates based on gradient interleaving of diversification and textual relevance approaches. We conducted extensive experiments on the FACC dataset, showing that our approaches convincingly outperform carefully selected baselines in both intrinsic and extrinsic measures while keeping users engaged.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
只需一点点帮助就能发现实体
将人、组织、书籍、音乐团体和他们的歌曲等实体以文本形式链接到知识库(KBs)是许多下游搜索和挖掘应用程序的基本任务。实现高消歧准确性关键取决于知识库中实体的丰富和整体表示。对于流行的实体,这种表示可以很容易地从维基百科中挖掘出来,并且许多当前的实体消歧和链接方法都利用了这一事实。然而,维基百科不包含只有少数人感兴趣的长尾实体,而且有时也会滞后,直到新出现的实体被加入。对于这样的实体,以完全自动化的方式挖掘合适的表示是非常困难的,导致链接准确性很差。但是,可以自动挖掘的是给定任何文本中出现的新实体的上下文的高质量表示。由于缺乏对实体的了解,没有任何方法可以高精度地自动检索这些事件,从而导致了一个先有鸡还是先有蛋的问题。为了解决这个问题,我们的方法自动生成实体的候选出现,提示用户反馈,以确定出现是否指的是有问题的实际实体。这种反馈逐渐提高了知识,并允许我们的方法提供更好的候选建议,以保持用户的参与。我们提出了一种新的基于多样化和文本关联的梯度交错生成候选词的人在环检索方法。我们在FACC数据集上进行了广泛的实验,结果表明,我们的方法在保持用户参与度的同时,在内在和外在指标上都令人信服地优于精心选择的基线。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Querying Minimal Steiner Maximum-Connected Subgraphs in Large Graphs aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model Approximate Discovery of Functional Dependencies for Large Datasets Mining Shopping Patterns for Divergent Urban Regions by Incorporating Mobility Data A Personal Perspective and Retrospective on Web Search Technology
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1