SKILL: Structured Knowledge Infusion for Large Language Models

North American Chapter of the Association for Computational Linguistics Pub Date : 2022-05-17 DOI:10.18653/v1/2022.naacl-main.113

Fedor Moiseev, Zhe Dong, Enrique Alfonseca, Martin Jaggi

{"title":"SKILL: Structured Knowledge Infusion for Large Language Models","authors":"Fedor Moiseev, Zhe Dong, Enrique Alfonseca, Martin Jaggi","doi":"10.18653/v1/2022.naacl-main.113","DOIUrl":null,"url":null,"abstract":"Large language models (LLMs) have demonstrated human-level performance on a vast spectrum of natural language tasks. However, it is largely unexplored whether they can better internalize knowledge from a structured data, such as a knowledge graph, or from text. In this work, we propose a method to infuse structured knowledge into LLMs, by directly training T5 models on factual triples of knowledge graphs (KGs). We show that models pre-trained on Wikidata KG with our method outperform the T5 baselines on FreebaseQA and WikiHop, as well as the Wikidata-answerable subset of TriviaQA and NaturalQuestions. The models pre-trained on factual triples compare competitively with the ones on natural language sentences that contain the same knowledge. Trained on a smaller size KG, WikiMovies, we saw 3x improvement of exact match score on MetaQA task. The proposed method has an advantage that no alignment between the knowledge graph and text corpus is required in curating training data. This makes our method particularly useful when working with industry-scale knowledge graphs.","PeriodicalId":382084,"journal":{"name":"North American Chapter of the Association for Computational Linguistics","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"North American Chapter of the Association for Computational Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.naacl-main.113","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 28

Abstract

Large language models (LLMs) have demonstrated human-level performance on a vast spectrum of natural language tasks. However, it is largely unexplored whether they can better internalize knowledge from a structured data, such as a knowledge graph, or from text. In this work, we propose a method to infuse structured knowledge into LLMs, by directly training T5 models on factual triples of knowledge graphs (KGs). We show that models pre-trained on Wikidata KG with our method outperform the T5 baselines on FreebaseQA and WikiHop, as well as the Wikidata-answerable subset of TriviaQA and NaturalQuestions. The models pre-trained on factual triples compare competitively with the ones on natural language sentences that contain the same knowledge. Trained on a smaller size KG, WikiMovies, we saw 3x improvement of exact match score on MetaQA task. The proposed method has an advantage that no alignment between the knowledge graph and text corpus is required in curating training data. This makes our method particularly useful when working with industry-scale knowledge graphs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

技能:大型语言模型的结构化知识注入

大型语言模型(llm)已经在广泛的自然语言任务中展示了人类水平的性能。然而，他们是否能够更好地从结构化数据(如知识图)或从文本中内化知识，这在很大程度上是未被探索的。在这项工作中，我们提出了一种将结构化知识注入法学硕士的方法，即直接在知识图的事实三元组(KGs)上训练T5模型。我们表明，使用我们的方法在Wikidata KG上预训练的模型优于FreebaseQA和WikiHop上的T5基线，以及TriviaQA和NaturalQuestions的Wikidata可回答子集。在事实三元组上预先训练的模型与包含相同知识的自然语言句子的模型进行了竞争性比较。在较小的KG (WikiMovies)上进行训练，我们看到MetaQA任务的精确匹配分数提高了3倍。该方法的优点是在训练数据管理中不需要知识图和文本语料库之间的对齐。这使得我们的方法在处理工业规模的知识图谱时特别有用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

North American Chapter of the Association for Computational Linguistics

自引率

0.00%

发文量

期刊最新文献

On Synthetic Data for Back Translation Mining Clues from Incomplete Utterance: A Query-enhanced Network for Incomplete Utterance Rewriting Using Paraphrases to Study Properties of Contextual Embeddings GMN: Generative Multi-modal Network for Practical Document Information Extraction Domain Confused Contrastive Learning for Unsupervised Domain Adaptation