Yiwen PengIP Paris, Thomas BonaldIP Paris, Mehwish AlamIP Paris
{"title":"Refining Wikidata Taxonomy using Large Language Models","authors":"Yiwen PengIP Paris, Thomas BonaldIP Paris, Mehwish AlamIP Paris","doi":"arxiv-2409.04056","DOIUrl":null,"url":null,"abstract":"Due to its collaborative nature, Wikidata is known to have a complex\ntaxonomy, with recurrent issues like the ambiguity between instances and\nclasses, the inaccuracy of some taxonomic paths, the presence of cycles, and\nthe high level of redundancy across classes. Manual efforts to clean up this\ntaxonomy are time-consuming and prone to errors or subjective decisions. We\npresent WiKC, a new version of Wikidata taxonomy cleaned automatically using a\ncombination of Large Language Models (LLMs) and graph mining techniques.\nOperations on the taxonomy, such as cutting links or merging classes, are\nperformed with the help of zero-shot prompting on an open-source LLM. The\nquality of the refined taxonomy is evaluated from both intrinsic and extrinsic\nperspectives, on a task of entity typing for the latter, showing the practical\ninterest of WiKC.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.04056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Due to its collaborative nature, Wikidata is known to have a complex
taxonomy, with recurrent issues like the ambiguity between instances and
classes, the inaccuracy of some taxonomic paths, the presence of cycles, and
the high level of redundancy across classes. Manual efforts to clean up this
taxonomy are time-consuming and prone to errors or subjective decisions. We
present WiKC, a new version of Wikidata taxonomy cleaned automatically using a
combination of Large Language Models (LLMs) and graph mining techniques.
Operations on the taxonomy, such as cutting links or merging classes, are
performed with the help of zero-shot prompting on an open-source LLM. The
quality of the refined taxonomy is evaluated from both intrinsic and extrinsic
perspectives, on a task of entity typing for the latter, showing the practical
interest of WiKC.