VisEval: A Benchmark for Data Visualization in the Era of Large Language Models.

IEEE transactions on visualization and computer graphics Pub Date : 2024-09-10 DOI:10.1109/TVCG.2024.3456320

Nan Chen, Yuge Zhang, Jiahang Xu, Kan Ren, Yuqing Yang

{"title":"VisEval: A Benchmark for Data Visualization in the Era of Large Language Models.","authors":"Nan Chen, Yuge Zhang, Jiahang Xu, Kan Ren, Yuqing Yang","doi":"10.1109/TVCG.2024.3456320","DOIUrl":null,"url":null,"abstract":"<p><p>Translating natural language to visualization (NL2VIS) has shown great promise for visual data analysis, but it remains a challenging task that requires multiple low-level implementations, such as natural language processing and visualization design. Recent advancements in pre-trained large language models (LLMs) are opening new avenues for generating visualizations from natural language. However, the lack of a comprehensive and reliable benchmark hinders our understanding of LLMs' capabilities in visualization generation. In this paper, we address this gap by proposing a new NL2VIS benchmark called VisEval. Firstly, we introduce a high-quality and large-scale dataset. This dataset includes 2,524 representative queries covering 146 databases, paired with accurately labeled ground truths. Secondly, we advocate for a comprehensive automated evaluation methodology covering multiple dimensions, including validity, legality, and readability. By systematically scanning for potential issues with a number of heterogeneous checkers, VisEval provides reliable and trustworthy evaluation outcomes. We run VisEval on a series of state-of-the-art LLMs. Our evaluation reveals prevalent challenges and delivers essential insights for future advancements.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on visualization and computer graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TVCG.2024.3456320","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Translating natural language to visualization (NL2VIS) has shown great promise for visual data analysis, but it remains a challenging task that requires multiple low-level implementations, such as natural language processing and visualization design. Recent advancements in pre-trained large language models (LLMs) are opening new avenues for generating visualizations from natural language. However, the lack of a comprehensive and reliable benchmark hinders our understanding of LLMs' capabilities in visualization generation. In this paper, we address this gap by proposing a new NL2VIS benchmark called VisEval. Firstly, we introduce a high-quality and large-scale dataset. This dataset includes 2,524 representative queries covering 146 databases, paired with accurately labeled ground truths. Secondly, we advocate for a comprehensive automated evaluation methodology covering multiple dimensions, including validity, legality, and readability. By systematically scanning for potential issues with a number of heterogeneous checkers, VisEval provides reliable and trustworthy evaluation outcomes. We run VisEval on a series of state-of-the-art LLMs. Our evaluation reveals prevalent challenges and delivers essential insights for future advancements.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

VisEval：大型语言模型时代的数据可视化基准。

将自然语言转化为可视化（NL2VIS）在可视化数据分析方面大有可为，但这仍然是一项具有挑战性的任务，需要多种底层实现，如自然语言处理和可视化设计。预训练大型语言模型（LLM）的最新进展为从自然语言生成可视化开辟了新途径。然而，由于缺乏全面可靠的基准，阻碍了我们对 LLM 在可视化生成方面能力的了解。在本文中，我们提出了一种名为 VisEval 的新 NL2VIS 基准，从而弥补了这一空白。首先，我们引入了一个高质量、大规模的数据集。该数据集包括覆盖 146 个数据库的 2524 个具有代表性的查询，并与精确标注的地面真实数据配对。其次，我们主张采用全面的自动评估方法，涵盖多个维度，包括有效性、合法性和可读性。通过使用大量异构检查器系统地扫描潜在问题，VisEval 可以提供可靠、可信的评估结果。我们在一系列最先进的 LLM 上运行 VisEval。我们的评估揭示了普遍存在的挑战，并为未来的进步提供了重要启示。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on visualization and computer graphics

自引率

0.00%

发文量