Translating natural language to visualization (NL2VIS) has shown great promise for visual data analysis, but it remains a challenging task that requires multiple low-level implementations, such as natural language processing and visualization design. Recent advancements in pre-trained large language models (LLMs) are opening new avenues for generating visualizations from natural language. However, the lack of a comprehensive and reliable benchmark hinders our understanding of LLMs' capabilities in visualization generation. In this paper, we address this gap by proposing a new NL2VIS benchmark called VisEval. Firstly, we introduce a high-quality and large-scale dataset. This dataset includes 2,524 representative queries covering 146 databases, paired with accurately labeled ground truths. Secondly, we advocate for a comprehensive automated evaluation methodology covering multiple dimensions, including validity, legality, and readability. By systematically scanning for potential issues with a number of heterogeneous checkers, VisEval provides reliable and trustworthy evaluation outcomes. We run VisEval on a series of state-of-the-art LLMs. Our evaluation reveals prevalent challenges and delivers essential insights for future advancements.
Tactics play an important role in team sports by guiding how players interact on the field. Both sports fans and experts have a demand for analyzing sports tactics. Existing approaches allow users to visually perceive the multivariate tactical effects. However, these approaches require users to experience a complex reasoning process to connect the multiple interactions within each tactic to the final tactical effect. In this work, we collaborate with basketball experts and propose a progressive approach to help users gain a deeper understanding of how each tactic works and customize tactics on demand. Users can progressively sketch on a tactic board, and a coach agent will simulate the possible actions in each step and present the simulation to users with facet visualizations. We develop an extensible framework that integrates large language models (LLMs) and visualizations to help users communicate with the coach agent with multimodal inputs. Based on the framework, we design and develop Smartboard, an agent-based interactive visualization system for fine-grained tactical analysis, especially for play design. Smartboard provides users with a structured process of setup, simulation, and evolution, allowing for iterative exploration of tactics based on specific personalized scenarios. We conduct case studies based on real-world basketball datasets to demonstrate the effectiveness and usefulness of our system.
The visualization community has a rich history of reflecting upon visualization design flaws. Although research in this area has remained lively, we believe it is essential to continuously revisit this classic and critical topic in visualization research by incorporating more empirical evidence from diverse sources, characterizing new design flaws, building more systematic theoretical frameworks, and understanding the underlying reasons for these flaws. To address the above gaps, this work investigated visualization design flaws through the lens of the public, constructed a framework to summarize and categorize the identified flaws, and explored why these flaws occur. Specifically, we analyzed 2227 flawed data visualizations collected from an online gallery and derived a design task-associated taxonomy containing 76 specific design flaws. These flaws were further classified into three high-level categories (i.e., misinformation, uninformativeness, unsociability) and ten subcategories (e.g., inaccuracy, unfairness, ambiguity). Next, we organized five focus groups to explore why these design flaws occur and identified seven causes of the flaws. Finally, we proposed a research agenda for combating visualization design flaws and summarize nine research opportunities.
The increasing reliance on Large Language Models (LLMs) for health information seeking can pose severe risks due to the potential for misinformation and the complexity of these topics. This paper introduces KNOWNET a visualization system that integrates LLMs with Knowledge Graphs (KG) to provide enhanced accuracy and structured exploration. Specifically, for enhanced accuracy, KNOWNET extracts triples (e.g., entities and their relations) from LLM outputs and maps them into the validated information and supported evidence in external KGs. For structured exploration, KNOWNET provides next-step recommendations based on the neighborhood of the currently explored entities in KGs, aiming to guide a comprehensive understanding without overlooking critical aspects. To enable reasoning with both the structured data in KGs and the unstructured outputs from LLMs, KNOWNET conceptualizes the understanding of a subject as the gradual construction of graph visualization. A progressive graph visualization is introduced to monitor past inquiries, and bridge the current query with the exploration history and next-step recommendations. We demonstrate the effectiveness of our system via use cases and expert interviews.
When using exploratory visual analysis to examine multivariate hierarchical data, users often need to query data to narrow down the scope of analysis. However, formulating effective query expressions remains a challenge for multivariate hierarchical data, particularly when datasets become very large. To address this issue, we develop a declarative grammar, HiRegEx (Hierarchical data Regular Expression), for querying and exploring multivariate hierarchical data. Rooted in the extended multi-level task topology framework for tree visualizations (e-MLTT), HiRegEx delineates three query targets (node, path, and subtree) and two aspects for querying these targets (features and positions), and uses operators developed based on classical regular expressions for query construction. Based on the HiRegEx grammar, we develop an exploratory framework for querying and exploring multivariate hierarchical data and integrate it into the TreeQueryER prototype system. The exploratory framework includes three major components: top-down pattern specification, bottom-up data-driven inquiry, and context-creation data overview. We validate the expressiveness of HiRegEx with the tasks from the e-MLTT framework and showcase the utility and effectiveness of TreeQueryER system through a case study involving expert users in the analysis of a citation tree dataset.
Inspired by recent advances in digital fabrication, artists and scientists have demonstrated that physical data encodings (i.e., data physicalizations) can increase engagement with data, foster collaboration, and in some cases, improve data legibility and analysis relative to digital alternatives. However, prior empirical studies have only investigated abstract data encoded in physical form (e.g., laser cut bar charts) and not continuously sampled spatial data fields relevant to climate and medical science (e.g., heights, temperatures, densities, and velocities sampled on a spatial grid). This paper presents the design and results of the first study to characterize human performance in 3D spatial data analysis tasks across analogous physical and digital visualizations. Participants analyzed continuous spatial elevation data with three visualization modalities: (1) 2D digital visualization; (2) perspective-tracked, stereoscopic "fishtank" virtual reality; and (3) 3D printed data physicalization. Their tasks included tracing paths downhill, looking up spatial locations and comparing their relative heights, and identifying and reporting the minimum and maximum heights within certain spatial regions. As hypothesized, in most cases, participants performed the tasks just as well or better in the physical modality (based on time and error metrics). Additional results include an analysis of open-ended feedback from participants and discussion of implications for further research on the value of data physicalization. All data and supplemental materials are available at https://osf.io/7xdq4/.
Projecting high-dimensional vectors into two dimensions for visualization, known as embedding visualization, facilitates perceptual reasoning and interpretation. Comparing multiple embedding visualizations drives decision-making in many domains, but traditional comparison methods are limited by a reliance on direct point correspondences. This requirement precludes comparisons without point correspondences, such as two different datasets of annotated images, and fails to capture meaningful higher-level relationships among point groups. To address these shortcomings, we propose a general framework for comparing embedding visualizations based on shared class labels rather than individual points. Our approach partitions points into regions corresponding to three key class concepts-confusion, neighborhood, and relative size-to characterize intra- and inter-class relationships. Informed by a preliminary user study, we implemented our framework using perceptual neighborhood graphs to defne these regions and introduced metrics to quantify each concept. We demonstrate the generality of our framework with usage scenarios from machine learning and single-cell biology, highlighting our metrics' ability to draw insightful comparisons across label hierarchies. To assess the effectiveness of our approach, we conducted an evaluation study with fve machine learning researchers and six single-cell biologists using an interactive and scalable prototype built with Python, JavaScript, and Rust. Our metrics enable more structured comparisons through visual guidance and increased participants' confdence in their fndings.