BoostER: Leveraging Large Language Models for Enhancing Entity Resolution

ArXiv Pub Date : 2024-03-11 DOI:10.1145/3589335.3651245 10.1145/3589335.3651245 10.1145/3589335.3651245

Huahang Li, Shuangyin Li, Fei Hao, C. Zhang, Yuanfeng Song, Lei Chen

{"title":"BoostER: Leveraging Large Language Models for Enhancing Entity Resolution","authors":"Huahang Li, Shuangyin Li, Fei Hao, C. Zhang, Yuanfeng Song, Lei Chen","doi":"10.1145/3589335.3651245 10.1145/3589335.3651245 10.1145/3589335.3651245","DOIUrl":null,"url":null,"abstract":"Entity resolution, which involves identifying and merging records that refer to the same real-world entity, is a crucial task in areas like Web data integration. This importance is underscored by the presence of numerous duplicated and multi-version data resources on the Web. However, achieving high-quality entity resolution typically demands significant effort. The advent of Large Language Models (LLMs) like GPT-4 has demonstrated advanced linguistic capabilities, which can be a new paradigm for this task. In this paper, we propose a demonstration system named BoostER that examines the possibility of leveraging LLMs in the entity resolution process, revealing advantages in both easy deployment and low cost. Our approach optimally selects a set of matching questions and poses them to LLMs for verification, then refines the distribution of entity resolution results with the response of LLMs. This offers promising prospects to achieve a high-quality entity resolution result for real-world applications, especially to individuals or small companies without the need for extensive model training or significant financial investment.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"28 37","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3589335.3651245 10.1145/3589335.3651245 10.1145/3589335.3651245","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Entity resolution, which involves identifying and merging records that refer to the same real-world entity, is a crucial task in areas like Web data integration. This importance is underscored by the presence of numerous duplicated and multi-version data resources on the Web. However, achieving high-quality entity resolution typically demands significant effort. The advent of Large Language Models (LLMs) like GPT-4 has demonstrated advanced linguistic capabilities, which can be a new paradigm for this task. In this paper, we propose a demonstration system named BoostER that examines the possibility of leveraging LLMs in the entity resolution process, revealing advantages in both easy deployment and low cost. Our approach optimally selects a set of matching questions and poses them to LLMs for verification, then refines the distribution of entity resolution results with the response of LLMs. This offers promising prospects to achieve a high-quality entity resolution result for real-world applications, especially to individuals or small companies without the need for extensive model training or significant financial investment.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

BoostER：利用大型语言模型增强实体解析能力

实体解析涉及识别和合并指向同一现实世界实体的记录，是网络数据集成等领域的一项重要任务。网络上存在大量重复和多版本的数据资源，这就凸显了这项任务的重要性。然而，实现高质量的实体解析通常需要付出巨大的努力。像 GPT-4 这样的大型语言模型（LLM）的出现展示了先进的语言能力，可以成为这项任务的新范例。在本文中，我们提出了一个名为 BoostER 的演示系统，该系统研究了在实体解析过程中利用 LLM 的可能性，揭示了 LLM 在易于部署和低成本方面的优势。我们的方法以最佳方式选择一组匹配问题，并将其提交给 LLMs 进行验证，然后根据 LLMs 的响应完善实体解析结果的分布。这为现实世界的应用，尤其是个人或小公司的应用，提供了实现高质量实体解析结果的广阔前景，而无需大量的模型训练或大量的资金投入。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ArXiv

自引率

0.00%

发文量