Entity Matching using Large Language Models

arXiv (Cornell University) Pub Date : 2023-10-17 DOI:10.48550/arxiv.2310.11244

Peeters, Ralph, Bizer, Christian

{"title":"Entity Matching using Large Language Models","authors":"Peeters, Ralph, Bizer, Christian","doi":"10.48550/arxiv.2310.11244","DOIUrl":null,"url":null,"abstract":"Entity Matching is the task of deciding whether two entity descriptions refer to the same real-world entity. Entity Matching is a central step in most data integration pipelines and an enabler for many e-commerce applications which require to match products offers from different vendors. State-of-the-art entity matching methods often rely on pre-trained language models (PLMs) such as BERT or RoBERTa. Two major drawbacks of these models for entity matching are that (i) the models require significant amounts of task-specific training data and (ii) the fine-tuned models are not robust concerning out-of-distribution entities. In this paper, we investigate using large language models (LLMs) for entity matching as a less domain-specific training data reliant and more robust alternative to PLM-based matchers. Our study covers hosted LLMs, such as GPT3.5 and GPT4, as well as open source LLMs based on Llama2 which can be run locally. We evaluate these models in a zero-shot scenario as well as a scenario where task-specific training data is available. We compare different prompt designs as well as the prompt sensitivity of the models in the zero-shot scenario. We investigate (i) the selection of in-context demonstrations, (ii) the generation of matching rules, as well as (iii) fine-tuning GPT3.5 in the second scenario using the same pool of training data across the different approaches. Our experiments show that GPT4 without any task-specific training data outperforms fine-tuned PLMs (RoBERTa and Ditto) on three out of five benchmark datasets reaching F1 scores around 90%. The experiments with in-context learning and rule generation show that all models beside of GPT4 benefit from these techniques (on average 5.9% and 2.2% F1), while GPT4 does not need such additional guidance in most cases...","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv (Cornell University)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arxiv.2310.11244","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Entity Matching is the task of deciding whether two entity descriptions refer to the same real-world entity. Entity Matching is a central step in most data integration pipelines and an enabler for many e-commerce applications which require to match products offers from different vendors. State-of-the-art entity matching methods often rely on pre-trained language models (PLMs) such as BERT or RoBERTa. Two major drawbacks of these models for entity matching are that (i) the models require significant amounts of task-specific training data and (ii) the fine-tuned models are not robust concerning out-of-distribution entities. In this paper, we investigate using large language models (LLMs) for entity matching as a less domain-specific training data reliant and more robust alternative to PLM-based matchers. Our study covers hosted LLMs, such as GPT3.5 and GPT4, as well as open source LLMs based on Llama2 which can be run locally. We evaluate these models in a zero-shot scenario as well as a scenario where task-specific training data is available. We compare different prompt designs as well as the prompt sensitivity of the models in the zero-shot scenario. We investigate (i) the selection of in-context demonstrations, (ii) the generation of matching rules, as well as (iii) fine-tuning GPT3.5 in the second scenario using the same pool of training data across the different approaches. Our experiments show that GPT4 without any task-specific training data outperforms fine-tuned PLMs (RoBERTa and Ditto) on three out of five benchmark datasets reaching F1 scores around 90%. The experiments with in-context learning and rule generation show that all models beside of GPT4 benefit from these techniques (on average 5.9% and 2.2% F1), while GPT4 does not need such additional guidance in most cases...

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用大型语言模型的实体匹配

实体匹配是确定两个实体描述是否引用同一个现实世界实体的任务。实体匹配是大多数数据集成管道中的核心步骤，也是许多需要匹配来自不同供应商的产品的电子商务应用程序的推动者。最先进的实体匹配方法通常依赖于预训练语言模型(plm)，如BERT或RoBERTa。这些模型用于实体匹配的两个主要缺点是:(i)模型需要大量特定于任务的训练数据，(ii)微调模型对于分布外的实体不具有鲁棒性。在本文中，我们研究了使用大型语言模型(llm)进行实体匹配，作为基于plm的匹配器的更少的特定领域训练数据依赖和更健壮的替代方案。我们的研究涵盖了托管llm，如GPT3.5和GPT4，以及基于可以在本地运行的Llama2的开源llm。我们在零射击场景和特定任务训练数据可用的场景中评估这些模型。我们比较了不同的提示设计以及模型在零射击场景下的提示灵敏度。我们研究了(i)上下文演示的选择，(ii)匹配规则的生成，以及(iii)在第二种场景中使用跨不同方法的相同训练数据池微调GPT3.5。我们的实验表明，没有任何特定任务训练数据的GPT4在5个基准数据集中的3个上优于经过微调的PLMs (RoBERTa和Ditto)，达到F1分数约90%。使用上下文学习和规则生成的实验表明，除了GPT4之外的所有模型都受益于这些技术(平均5.9%和2.2% F1)，而GPT4在大多数情况下不需要这种额外的指导……

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助