HINTs: Sensemaking on Large Collections of Documents With Hypergraph Visualization and INTelligent Agents

Sam Yu-Te Lee;Kwan-Liu Ma
{"title":"HINTs: Sensemaking on Large Collections of Documents With Hypergraph Visualization and INTelligent Agents","authors":"Sam Yu-Te Lee;Kwan-Liu Ma","doi":"10.1109/TVCG.2024.3459961","DOIUrl":null,"url":null,"abstract":"Sensemaking on a large collection of documents (corpus) is a challenging task often found in fields such as market research, legal studies, intelligence analysis, political science, or computational linguistics. Previous works approach this problem from topic- and entity-based perspectives, but the capability of the underlying NLP model limits their effectiveness. Recent advances in prompting with LLMs present opportunities to enhance such approaches with higher accuracy and customizability. However, poorly designed prompts and visualizations could mislead users into falsely interpreting the visualizations and hinder the system's trustworthiness. In this paper, we address this issue by taking into account the user analysis tasks and visualization goals in the prompt-based data extraction stage, thereby extending the concept of Model Alignment. We present HINTs, a VA system for supporting sensemaking on large collections of documents, combining previous entity-based and topic-based approaches. The visualization pipeline of HINTs consists of three stages. First, entities and topics are extracted from the corpus with prompts. Then, the result is modeled as a hypergraph and hierarchically clustered. Finally, an enhanced space-filling curve layout is applied to visualize the hypergraph for interactive exploration. The system further integrates an LLM-based intelligent chatbot agent in the interface to facilitate the sensemaking of interested documents. To demonstrate the generalizability and effectiveness of the HINTs system, we present two case studies on different domains and a comparative user study. We report our insights on the behavior patterns and challenges when intelligent agents are used to facilitate sensemaking. We find that while intelligent agents can address many challenges in sensemaking, the visual hints that visualizations provide are still necessary. We discuss limitations and future work for combining interactive visualization and LLMs more profoundly to better support corpus analysis.","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"31 9","pages":"5532-5546"},"PeriodicalIF":6.5000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on visualization and computer graphics","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10679530/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Sensemaking on a large collection of documents (corpus) is a challenging task often found in fields such as market research, legal studies, intelligence analysis, political science, or computational linguistics. Previous works approach this problem from topic- and entity-based perspectives, but the capability of the underlying NLP model limits their effectiveness. Recent advances in prompting with LLMs present opportunities to enhance such approaches with higher accuracy and customizability. However, poorly designed prompts and visualizations could mislead users into falsely interpreting the visualizations and hinder the system's trustworthiness. In this paper, we address this issue by taking into account the user analysis tasks and visualization goals in the prompt-based data extraction stage, thereby extending the concept of Model Alignment. We present HINTs, a VA system for supporting sensemaking on large collections of documents, combining previous entity-based and topic-based approaches. The visualization pipeline of HINTs consists of three stages. First, entities and topics are extracted from the corpus with prompts. Then, the result is modeled as a hypergraph and hierarchically clustered. Finally, an enhanced space-filling curve layout is applied to visualize the hypergraph for interactive exploration. The system further integrates an LLM-based intelligent chatbot agent in the interface to facilitate the sensemaking of interested documents. To demonstrate the generalizability and effectiveness of the HINTs system, we present two case studies on different domains and a comparative user study. We report our insights on the behavior patterns and challenges when intelligent agents are used to facilitate sensemaking. We find that while intelligent agents can address many challenges in sensemaking, the visual hints that visualizations provide are still necessary. We discuss limitations and future work for combining interactive visualization and LLMs more profoundly to better support corpus analysis.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
HINTs:利用超图可视化和智能代理对大型文档集进行感知决策
在大量文档(语料库)上进行语义构建是一项具有挑战性的任务,经常在市场研究、法律研究、情报分析、政治学或计算语言学等领域发现。以前的工作是从基于主题和实体的角度来解决这个问题的,但是底层NLP模型的能力限制了它们的有效性。llm提示的最新进展为提高这些方法的准确性和可定制性提供了机会。然而,设计不佳的提示和可视化可能会误导用户错误地解释可视化并阻碍系统的可信度。在本文中,我们通过考虑基于提示的数据提取阶段的用户分析任务和可视化目标来解决这个问题,从而扩展了模型对齐的概念。我们提出了提示,这是一个支持大型文档集合上的语义生成的VA系统,结合了以前基于实体和基于主题的方法。提示的可视化管道包括三个阶段。首先,通过提示从语料库中提取实体和主题。然后,将结果建模为超图并分层聚类。最后,采用增强的空间填充曲线布局实现超图的可视化,便于交互探索。系统进一步在界面中集成了基于llm的智能聊天机器人代理,以方便感兴趣文档的语义构建。为了证明提示系统的普遍性和有效性,我们提出了两个不同领域的案例研究和一个比较用户研究。我们报告了当使用智能代理来促进意义生成时,我们对行为模式和挑战的见解。我们发现,虽然智能代理可以解决语义构建中的许多挑战,但可视化提供的视觉提示仍然是必要的。我们讨论了将交互式可视化和法学硕士更深入地结合起来以更好地支持语料库分析的局限性和未来工作。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
ESGaussianFace: Emotional and Stylized Audio-Driven Facial Animation Via 3D Gaussian Splatting. Generating Distance-Aware Human-to-Human Interaction Motions From Text Guidance. LAMDA: Aiding Visual Exploration of Atomic Displacements in Molecular Dynamics Simulations. Locally Adapted Reference Frame Fields using Moving Least Squares. SeparateGen: Semantic Component-based 3D Character Generation from Single Images.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1