Existing image-based virtual try-on methods are limited to frontal views and lack real-time performance. While per-garment virtual try-on methods have tackled these issues by adopting per-garment training, they still encounter practical limitations: (1) the robotic mannequin used for per-garment datasets collection is prohibitively expensive; (2) the synthesized garments often misalign with the human body. To address these challenges, we propose a low-barrier approach to collect per-garment datasets using real human bodies, eliminating the need for an expensive robotic mannequin and reducing data collection time from 2 hours to 2 minutes. Additionally, we introduce a hybrid person representation that ensures precise human-garment alignment. We conducted qualitative and quantitative comparisons with state-of-the-art image-based virtual try-on methods to demonstrate the superiority of our method regarding image quality and temporal consistency. Furthermore, most participants in our user study found the system effective in supporting garment purchasing decisions.
{"title":"Low-Barrier Dataset Collection With Real Human Body for Interactive Per-Garment Virtual Try-On.","authors":"Zaiqiang Wu, Yechen Li, Jingyuan Liu, Yuki Shibata, Takayuki Hori, I-Chao Shen, Takeo Igarashi","doi":"10.1109/MCG.2025.3649499","DOIUrl":"https://doi.org/10.1109/MCG.2025.3649499","url":null,"abstract":"<p><p>Existing image-based virtual try-on methods are limited to frontal views and lack real-time performance. While per-garment virtual try-on methods have tackled these issues by adopting per-garment training, they still encounter practical limitations: (1) the robotic mannequin used for per-garment datasets collection is prohibitively expensive; (2) the synthesized garments often misalign with the human body. To address these challenges, we propose a low-barrier approach to collect per-garment datasets using real human bodies, eliminating the need for an expensive robotic mannequin and reducing data collection time from 2 hours to 2 minutes. Additionally, we introduce a hybrid person representation that ensures precise human-garment alignment. We conducted qualitative and quantitative comparisons with state-of-the-art image-based virtual try-on methods to demonstrate the superiority of our method regarding image quality and temporal consistency. Furthermore, most participants in our user study found the system effective in supporting garment purchasing decisions.</p>","PeriodicalId":55026,"journal":{"name":"IEEE Computer Graphics and Applications","volume":"PP ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145879409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-10DOI: 10.1109/MCG.2025.3642747
Jo Wood, Niamh Devane, Abi Roper, Nicola Botting, Madeline Cruice, Ulfa Octaviani, Stephanie Wilson
Current data visualization research demonstrates very limited inclusion of users with language disabilities. To address this, this paper introduces the language disabilities Developmental Language Disorder (DLD) and aphasia. We present outcomes from a novel qualitative diary study exploring whether people living with either DLD or aphasia experience and engage with data visualization in their day-to-day lives. Outcomes reveal evidence of both exposure to, and engagement with, data visualization across a week-long period alongside accompanying experiences of inclusion and exclusion of the benefits of data visualization. We report types of data visualization tasks and application domains encountered and descriptions of issues experienced by participants. Findings highlight a critical need for increased awareness of language access needs within the discipline of data visualization and a case for further research into design practices inclusive of people with language disabilities.
目前的数据可视化研究表明,包含语言障碍的用户非常有限。为此,本文介绍了发展性语言障碍(Developmental language Disorder, DLD)和失语症。我们介绍了一项新的定性日记研究的结果,该研究探讨了患有DLD或失语症的人是否在日常生活中体验并参与数据可视化。结果揭示了在为期一周的时间内,数据可视化的暴露和参与的证据,以及数据可视化的好处的包含和排除的经验。我们报告所遇到的数据可视化任务和应用领域的类型以及参与者所经历的问题的描述。研究结果强调,在数据可视化学科中,迫切需要提高对语言获取需求的认识,并需要进一步研究包括语言障碍人士在内的设计实践。
{"title":"Experiencing Data Visualization with Language Disability.","authors":"Jo Wood, Niamh Devane, Abi Roper, Nicola Botting, Madeline Cruice, Ulfa Octaviani, Stephanie Wilson","doi":"10.1109/MCG.2025.3642747","DOIUrl":"https://doi.org/10.1109/MCG.2025.3642747","url":null,"abstract":"<p><p>Current data visualization research demonstrates very limited inclusion of users with language disabilities. To address this, this paper introduces the language disabilities Developmental Language Disorder (DLD) and aphasia. We present outcomes from a novel qualitative diary study exploring whether people living with either DLD or aphasia experience and engage with data visualization in their day-to-day lives. Outcomes reveal evidence of both exposure to, and engagement with, data visualization across a week-long period alongside accompanying experiences of inclusion and exclusion of the benefits of data visualization. We report types of data visualization tasks and application domains encountered and descriptions of issues experienced by participants. Findings highlight a critical need for increased awareness of language access needs within the discipline of data visualization and a case for further research into design practices inclusive of people with language disabilities.</p>","PeriodicalId":55026,"journal":{"name":"IEEE Computer Graphics and Applications","volume":"PP ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145727348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Real-time path tracing is computationally expensive due to intensive path sampling and shading, especially under high frame rate and high resolution demands. We present HyShare, a hybrid reuse algorithm that integrates ReSTIR-style path sample reuse with adaptive shading reuse across spatial and temporal domains. Unlike prior methods that treat reuse in isolation, HyShare jointly optimizes both reuse types, addressing their interdependencies while maintaining image fidelity. To prevent artifacts caused by stale data and correlation, we introduce per-pixel validation and dynamic refresh mechanisms. Our system adaptively disables reuse in motion-sensitive regions using radiance and geometric change checks. Evaluated on complex dynamic scenes, HyShare outperforms state-of-the-art baselines-including ReSTIR DI, ReSTIR PT, and Area ReSTIR-improving rendering speed by 37.4% and boosting image quality (PSNR +1.8 dB, SSIM +0.17). These results demonstrate the effectiveness and generalizability of HyShare in advancing real-time photorealistic rendering.
{"title":"HyShare: Hybrid Sample and Shading Reuse for Real-time Photorealistic Rendering.","authors":"Yubin Zhou, Xiyun Song, Zhiqiang Lao, Yu Guo, Zongfang Lin, Heather Yu, Liang Peng","doi":"10.1109/MCG.2025.3638242","DOIUrl":"https://doi.org/10.1109/MCG.2025.3638242","url":null,"abstract":"<p><p>Real-time path tracing is computationally expensive due to intensive path sampling and shading, especially under high frame rate and high resolution demands. We present HyShare, a hybrid reuse algorithm that integrates ReSTIR-style path sample reuse with adaptive shading reuse across spatial and temporal domains. Unlike prior methods that treat reuse in isolation, HyShare jointly optimizes both reuse types, addressing their interdependencies while maintaining image fidelity. To prevent artifacts caused by stale data and correlation, we introduce per-pixel validation and dynamic refresh mechanisms. Our system adaptively disables reuse in motion-sensitive regions using radiance and geometric change checks. Evaluated on complex dynamic scenes, HyShare outperforms state-of-the-art baselines-including ReSTIR DI, ReSTIR PT, and Area ReSTIR-improving rendering speed by 37.4% and boosting image quality (PSNR +1.8 dB, SSIM +0.17). These results demonstrate the effectiveness and generalizability of HyShare in advancing real-time photorealistic rendering.</p>","PeriodicalId":55026,"journal":{"name":"IEEE Computer Graphics and Applications","volume":"PP ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145642820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eddies are dynamic, swirling structures in ocean circulation that significantly influence the distribution of heat, nutrients, and plankton, there by impacting marine biological processes. Accurate eddy segmentation from ocean simulation data is essential for enabling subsequent biological and physical analysis. However, leveraging vector-valued inputs, such as ocean velocity fields, in deep learning-based segmentation models poses unique challenges due to the complexity of representing the vector input in multiple combinations for training. In this paper, we discuss such challenges and provide our solutions. In particular, we present a detailed study into multiple input encoding strategies, including raw velocity components, vector magnitude, and angular direction, and their impacton eddy segmentation performance. We introduce a two-branch attention U-Net architecture that separately encodes vector magnitude and direction. We evaluate seven different network configurations across four large-scale 3D ocean simulation data sets, employing four different segmentation metrics. Our results demonstrate that the proposed two-branch architecture consistently out performs single-branch variants.
{"title":"Deep Learning-based Eddy Segmentation with Vector-Data for Biochemical Analysis in Ocean Simulations.","authors":"Weiping Hua, Sedat Ozer, Karen Bemis, Zihan Liu, Deborah Silver","doi":"10.1109/MCG.2025.3630582","DOIUrl":"https://doi.org/10.1109/MCG.2025.3630582","url":null,"abstract":"<p><p>Eddies are dynamic, swirling structures in ocean circulation that significantly influence the distribution of heat, nutrients, and plankton, there by impacting marine biological processes. Accurate eddy segmentation from ocean simulation data is essential for enabling subsequent biological and physical analysis. However, leveraging vector-valued inputs, such as ocean velocity fields, in deep learning-based segmentation models poses unique challenges due to the complexity of representing the vector input in multiple combinations for training. In this paper, we discuss such challenges and provide our solutions. In particular, we present a detailed study into multiple input encoding strategies, including raw velocity components, vector magnitude, and angular direction, and their impacton eddy segmentation performance. We introduce a two-branch attention U-Net architecture that separately encodes vector magnitude and direction. We evaluate seven different network configurations across four large-scale 3D ocean simulation data sets, employing four different segmentation metrics. Our results demonstrate that the proposed two-branch architecture consistently out performs single-branch variants.</p>","PeriodicalId":55026,"journal":{"name":"IEEE Computer Graphics and Applications","volume":"PP ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145472533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Image-based person reidentification aims to match individuals across multiple cameras. Despite advances in machine learning, their effectiveness in real-world scenarios remains limited, often leaving users to handle fine-grained matching manually. Recent work has explored textual information as auxiliary cues, but existing methods generate coarse descriptions and fail to integrate them effectively into retrieval workflows. To address these issues, we adopt a vision-language model fine-tuned with domain-specific knowledge to generate detailed textual descriptions and keywords for pedestrian images. We then create a joint search space combining visual and textual information, using image clustering and keyword co-occurrence to build a semantic layout. In addition, we introduce a dynamic spiral word cloud algorithm to improve visual presentation and enhance semantic associations. Finally, we conduct case studies, a user study, and expert feedback, demonstrating the usability and effectiveness of our system.
{"title":"Enhancing Visual Analysis in Person Reidentification With Vision-Language Models.","authors":"Wang Xia, Tianci Wang, Jiawei Li, Guodao Sun, Haidong Gao, Xu Tan, Ronghua Liang","doi":"10.1109/MCG.2025.3593227","DOIUrl":"10.1109/MCG.2025.3593227","url":null,"abstract":"<p><p>Image-based person reidentification aims to match individuals across multiple cameras. Despite advances in machine learning, their effectiveness in real-world scenarios remains limited, often leaving users to handle fine-grained matching manually. Recent work has explored textual information as auxiliary cues, but existing methods generate coarse descriptions and fail to integrate them effectively into retrieval workflows. To address these issues, we adopt a vision-language model fine-tuned with domain-specific knowledge to generate detailed textual descriptions and keywords for pedestrian images. We then create a joint search space combining visual and textual information, using image clustering and keyword co-occurrence to build a semantic layout. In addition, we introduce a dynamic spiral word cloud algorithm to improve visual presentation and enhance semantic associations. Finally, we conduct case studies, a user study, and expert feedback, demonstrating the usability and effectiveness of our system.</p>","PeriodicalId":55026,"journal":{"name":"IEEE Computer Graphics and Applications","volume":"PP ","pages":"44-60"},"PeriodicalIF":1.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144735526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large language models (LLMs) show potential in understanding visualizations and may capture design knowledge. However, their ability to predict human feedback remains unclear. To explore this, we conduct three studies evaluating the alignment between LLM-based agents and human ratings in visualization tasks. The first study replicates a human-subject study, showing promising agent performance in human-like reasoning and rating, and informing further experiments. The second study simulates six prior studies using agents and finds that alignment correlates with experts' pre-experiment confidence. The third study tests enhancement techniques, such as input preprocessing and knowledge injection, revealing limitations in robustness and potential bias. These findings suggest that LLM-based agents can simulate human ratings when guided by high-confidence hypotheses from expert evaluators. We also demonstrate the usage scenario in rapid prototyping study designs and discuss future directions. We note that simulation may only serve as complements and cannot replace user studies.
{"title":"Do Language Model Agents Align With Humans in Rating Visualizations? An Empirical Study.","authors":"Zekai Shao, Yi Shan, Yixuan He, Yuxuan Yao, Junhong Wang, Xiaolong Zhang, Yu Zhang, Siming Chen","doi":"10.1109/MCG.2025.3586461","DOIUrl":"10.1109/MCG.2025.3586461","url":null,"abstract":"<p><p>Large language models (LLMs) show potential in understanding visualizations and may capture design knowledge. However, their ability to predict human feedback remains unclear. To explore this, we conduct three studies evaluating the alignment between LLM-based agents and human ratings in visualization tasks. The first study replicates a human-subject study, showing promising agent performance in human-like reasoning and rating, and informing further experiments. The second study simulates six prior studies using agents and finds that alignment correlates with experts' pre-experiment confidence. The third study tests enhancement techniques, such as input preprocessing and knowledge injection, revealing limitations in robustness and potential bias. These findings suggest that LLM-based agents can simulate human ratings when guided by high-confidence hypotheses from expert evaluators. We also demonstrate the usage scenario in rapid prototyping study designs and discuss future directions. We note that simulation may only serve as complements and cannot replace user studies.</p>","PeriodicalId":55026,"journal":{"name":"IEEE Computer Graphics and Applications","volume":"PP ","pages":"14-28"},"PeriodicalIF":1.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144602294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01DOI: 10.1109/MCG.2025.3613129
Laura Raya, Alberto Sanchez, Carmen Martin, Jose Jesus Garcia Rueda, Erika Guijarro, Mike Potel
Surgery and hospital stays for pediatric transplantation involve frequent interventions that require complete sedation, care and self-care, disease assimilation, and anxiety for the patient. This article presents the development of a comprehensive tool called virtual transplant reality (VTR) currently used in a hospital with actual patients. Our tool is intended to provide an aid to the psychological support of children who have undergone a liver transplant. VTR consists of two applications: a virtual reality application with a head mounted display worn by the patient and a desktop application for the therapist. After tests carried out at the Hospital Universitario La Paz (Madrid, Spain) over a period of one year with 65 patients, the results indicate that our system offers a series of advantages as a complement to the psychological therapy of pediatric transplant patients.
{"title":"Enhancing Pediatric Liver Transplant Therapy With Virtual Reality.","authors":"Laura Raya, Alberto Sanchez, Carmen Martin, Jose Jesus Garcia Rueda, Erika Guijarro, Mike Potel","doi":"10.1109/MCG.2025.3613129","DOIUrl":"10.1109/MCG.2025.3613129","url":null,"abstract":"<p><p>Surgery and hospital stays for pediatric transplantation involve frequent interventions that require complete sedation, care and self-care, disease assimilation, and anxiety for the patient. This article presents the development of a comprehensive tool called virtual transplant reality (VTR) currently used in a hospital with actual patients. Our tool is intended to provide an aid to the psychological support of children who have undergone a liver transplant. VTR consists of two applications: a virtual reality application with a head mounted display worn by the patient and a desktop application for the therapist. After tests carried out at the Hospital Universitario La Paz (Madrid, Spain) over a period of one year with 65 patients, the results indicate that our system offers a series of advantages as a complement to the psychological therapy of pediatric transplant patients.</p>","PeriodicalId":55026,"journal":{"name":"IEEE Computer Graphics and Applications","volume":"45 6","pages":"130-140"},"PeriodicalIF":1.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145497501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01DOI: 10.1109/MCG.2025.3594817
Min Chen, David Ebert, Theresa-Marie Rhyne
While it is necessary for most (if not all) visualization and visual analytics (VIS) publication venues to use peer review processes to assure the quality of the papers to be published, it is also necessary for the VIS community to appraise and improve the quality of peer review processes from time to time. In recent years, rejecting a VIS paper seems to have become rather easy, as many rejection reasons are available to criticize a given paper. In this article, we analyze possible causes of this phenomenon and recommend possible remedies. In particular, over the past decades, the visualization field has rapidly grown to include many types of contributions and specialized research areas. Given this large landscape of topics, we need to ensure that good contributions within each area are reviewed properly, published, and built upon to make significant advancement in the area concerned. Therefore, it is crucial that our review process applies specific criteria for each area and does not expect individual publications to satisfy many review criteria designed for other areas. In this way, we hope VIS review processes will enable more VIS research with X factors (original, innovative, significant, impactful, rigorous, insightful, or inspirational) to be published promptly, allowing VIS researchers and practitioners to make even more impactful contributions to data sciences.
{"title":"How to Reject a VIS Paper, or Not?","authors":"Min Chen, David Ebert, Theresa-Marie Rhyne","doi":"10.1109/MCG.2025.3594817","DOIUrl":"10.1109/MCG.2025.3594817","url":null,"abstract":"<p><p>While it is necessary for most (if not all) visualization and visual analytics (VIS) publication venues to use peer review processes to assure the quality of the papers to be published, it is also necessary for the VIS community to appraise and improve the quality of peer review processes from time to time. In recent years, rejecting a VIS paper seems to have become rather easy, as many rejection reasons are available to criticize a given paper. In this article, we analyze possible causes of this phenomenon and recommend possible remedies. In particular, over the past decades, the visualization field has rapidly grown to include many types of contributions and specialized research areas. Given this large landscape of topics, we need to ensure that good contributions within each area are reviewed properly, published, and built upon to make significant advancement in the area concerned. Therefore, it is crucial that our review process applies specific criteria for each area and does not expect individual publications to satisfy many review criteria designed for other areas. In this way, we hope VIS review processes will enable more VIS research with X factors (original, innovative, significant, impactful, rigorous, insightful, or inspirational) to be published promptly, allowing VIS researchers and practitioners to make even more impactful contributions to data sciences.</p>","PeriodicalId":55026,"journal":{"name":"IEEE Computer Graphics and Applications","volume":"45 6","pages":"101-111"},"PeriodicalIF":1.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145497561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01DOI: 10.1109/MCG.2025.3597849
Yuheng Shao, Shiyi Liu, Gongyan Chen, Ruofei Ma, Xingbo Wang, Quan Li
Fashion e-commerce design requires the integration of creativity, functionality, and responsiveness to user preferences. While AI offers valuable support, generative models often miss the nuances of user experience, and task-specific models, although more accurate, lack transparency and real-world adaptability-especially with complex multimodal data. These issues reduce designers' trust and hinder effective AI integration. To address this, we present FashionCook, a visual analytics system designed to support human-AI collaboration in the context of fashion e-commerce. The system bridges communication among model builders, designers, and marketers by providing transparent model interpretations, "what-if" scenario exploration, and iterative feedback mechanisms. We validate the system through two real-world case studies and a user study, demonstrating how FashionCook enhances collaborative workflows and improves design outcomes in data-driven fashion e-commerce environments.
{"title":"FashionCook: A Visual Analytics System for Human-AI Collaboration in Fashion E-Commerce Design.","authors":"Yuheng Shao, Shiyi Liu, Gongyan Chen, Ruofei Ma, Xingbo Wang, Quan Li","doi":"10.1109/MCG.2025.3597849","DOIUrl":"10.1109/MCG.2025.3597849","url":null,"abstract":"<p><p>Fashion e-commerce design requires the integration of creativity, functionality, and responsiveness to user preferences. While AI offers valuable support, generative models often miss the nuances of user experience, and task-specific models, although more accurate, lack transparency and real-world adaptability-especially with complex multimodal data. These issues reduce designers' trust and hinder effective AI integration. To address this, we present FashionCook, a visual analytics system designed to support human-AI collaboration in the context of fashion e-commerce. The system bridges communication among model builders, designers, and marketers by providing transparent model interpretations, \"what-if\" scenario exploration, and iterative feedback mechanisms. We validate the system through two real-world case studies and a user study, demonstrating how FashionCook enhances collaborative workflows and improves design outcomes in data-driven fashion e-commerce environments.</p>","PeriodicalId":55026,"journal":{"name":"IEEE Computer Graphics and Applications","volume":"PP ","pages":"61-75"},"PeriodicalIF":1.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144979300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01DOI: 10.1109/MCG.2025.3605029
Alexander Bendeck, John Stasko, Rahul C Basole, Francesco Ferrise
Large language models (LLMs) are now being applied to the tasks of visualization generation and understanding, demonstrating these models' ability to be "visually literate." On the generation side, LLMs have shown promise in powering natural languages' interfaces for visualization authoring while also suffering from usability and inconsistency issues. On the interpretation side, models (especially vision-language models) can answer basic questions about visualizations, synthesize visual and textual information, and detect misleading visual designs. However, models also tend to struggle with certain analytic tasks, and their takeaways from reading visualizations often differ from those of humans. We aim to both illuminate the state of the art in LLMs' visualization literacy and speculate on where such work may, and perhaps ought to, take us next.
{"title":"How Visually Literate Are Large Language Models? Reflections on Recent Advances and Future Directions.","authors":"Alexander Bendeck, John Stasko, Rahul C Basole, Francesco Ferrise","doi":"10.1109/MCG.2025.3605029","DOIUrl":"10.1109/MCG.2025.3605029","url":null,"abstract":"<p><p>Large language models (LLMs) are now being applied to the tasks of visualization generation and understanding, demonstrating these models' ability to be \"visually literate.\" On the generation side, LLMs have shown promise in powering natural languages' interfaces for visualization authoring while also suffering from usability and inconsistency issues. On the interpretation side, models (especially vision-language models) can answer basic questions about visualizations, synthesize visual and textual information, and detect misleading visual designs. However, models also tend to struggle with certain analytic tasks, and their takeaways from reading visualizations often differ from those of humans. We aim to both illuminate the state of the art in LLMs' visualization literacy and speculate on where such work may, and perhaps ought to, take us next.</p>","PeriodicalId":55026,"journal":{"name":"IEEE Computer Graphics and Applications","volume":"45 6","pages":"120-129"},"PeriodicalIF":1.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145497505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}