首页 > 最新文献

Visual Informatics最新文献

英文 中文
CycleGaussianAvatar: Encoding of facial details with the cycle consistency framework CycleGaussianAvatar:使用循环一致性框架编码面部细节
IF 3.8 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-09-01 DOI: 10.1016/j.visinf.2025.100264
Bowei Yin, Junke Zhu, Zhangjin Huang
{"title":"CycleGaussianAvatar: Encoding of facial details with the cycle consistency framework","authors":"Bowei Yin, Junke Zhu, Zhangjin Huang","doi":"10.1016/j.visinf.2025.100264","DOIUrl":"10.1016/j.visinf.2025.100264","url":null,"abstract":"","PeriodicalId":36903,"journal":{"name":"Visual Informatics","volume":"9 4","pages":"Article 100264"},"PeriodicalIF":3.8,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FlowLLM: Large language model driven flow visualization FlowLLM:大型语言模型驱动的流可视化
IF 3.8 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-09-01 DOI: 10.1016/j.visinf.2025.100241
Zilin Li, Weihan Zhang, Jun Tao
Flow visualization is an essential tool for domain experts to understand and analyze flow fields intuitively. In the past decades, various interactive techniques were developed to customize flow visualization for exploration. However, these techniques usually use specifically designed graphical interfaces, requiring considerable learning and usage effort. Recently, FlowNL Huang et al., (2023) introduces a natural language interface to reduce the effort, but it still struggles with natural language ambiguities due to the lack of domain knowledge and provides limited ability to understand the context in dialogues. To address these issues, we propose an explorative flow visualization powered by a large language model that interacts with users. Our approach leverages an extensive dataset of flow-related queries to train the model, enhancing its ability to interpret a wide range of natural language expressions and maintain context over multi-turn interactions. Additionally, we introduce an advanced dialogue management system that supports interactive continuous communication between users and the system. Our empirical evaluations demonstrate significant improvements in user engagement and accuracy of flow structure extraction. These enhancements are crucial for expanding the applicability of flow visualization systems in real-world scenarios, where effective and intuitive user interfaces are paramount.
流场可视化是领域专家直观地理解和分析流场的重要工具。在过去的几十年里,开发了各种交互技术来定制用于勘探的流可视化。然而,这些技术通常使用专门设计的图形界面,需要大量的学习和使用工作。最近,FlowNL Huang等人(2023)引入了一种自然语言接口来减少工作量,但由于缺乏领域知识,它仍然与自然语言歧义作斗争,并且在对话中提供有限的理解上下文的能力。为了解决这些问题,我们提出了一个探索性的流程可视化,该可视化由一个与用户交互的大型语言模型提供支持。我们的方法利用广泛的流相关查询数据集来训练模型,增强其解释广泛的自然语言表达和在多回合交互中维护上下文的能力。此外,我们引入了一个先进的对话管理系统,支持用户和系统之间的交互式连续通信。我们的实证评估表明,在用户参与度和流结构提取的准确性方面有显著的改进。这些增强对于扩展流可视化系统在现实世界中的适用性至关重要,在现实世界中,有效和直观的用户界面是至关重要的。
{"title":"FlowLLM: Large language model driven flow visualization","authors":"Zilin Li,&nbsp;Weihan Zhang,&nbsp;Jun Tao","doi":"10.1016/j.visinf.2025.100241","DOIUrl":"10.1016/j.visinf.2025.100241","url":null,"abstract":"<div><div>Flow visualization is an essential tool for domain experts to understand and analyze flow fields intuitively. In the past decades, various interactive techniques were developed to customize flow visualization for exploration. However, these techniques usually use specifically designed graphical interfaces, requiring considerable learning and usage effort. Recently, FlowNL Huang et al., (2023) introduces a natural language interface to reduce the effort, but it still struggles with natural language ambiguities due to the lack of domain knowledge and provides limited ability to understand the context in dialogues. To address these issues, we propose an explorative flow visualization powered by a large language model that interacts with users. Our approach leverages an extensive dataset of flow-related queries to train the model, enhancing its ability to interpret a wide range of natural language expressions and maintain context over multi-turn interactions. Additionally, we introduce an advanced dialogue management system that supports interactive continuous communication between users and the system. Our empirical evaluations demonstrate significant improvements in user engagement and accuracy of flow structure extraction. These enhancements are crucial for expanding the applicability of flow visualization systems in real-world scenarios, where effective and intuitive user interfaces are paramount.</div></div>","PeriodicalId":36903,"journal":{"name":"Visual Informatics","volume":"9 3","pages":"Article 100241"},"PeriodicalIF":3.8,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145046546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
InferEdit: An instruction-based system with a multimodal LLM for complex multi-target image editing 一个基于指令的系统,具有多模态LLM,用于复杂的多目标图像编辑
IF 3.8 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-09-01 DOI: 10.1016/j.visinf.2025.100265
Zhiyong Huang, Yali She, MengLi Xiang, TuoJun Ding
To address the limitations of existing instruction-based image editing methods in handling complex Multi-target instructions and maintaining semantic consistency, we present InferEdit, a training-free image editing system driven by a Multimodal Large Language Model (MLLM). The system parses complex multi-target instructions into sequential subtasks and performs editing iteratively through target localization and semantic reasoning. Furthermore, to adaptively select the most suitable editing models, we construct the evaluation dataset InferDataset to evaluate various editing models on three types of tasks: object removal, object replacement, and local editing. Based on a comprehensive scoring mechanism, we build Binary Search Trees (BSTs) for different editing types to facilitate model scheduling. Experiments demonstrate that InferEdit outperforms existing methods in handling complex instructions while maintaining semantic consistency and visual quality.
为了解决现有基于指令的图像编辑方法在处理复杂的多目标指令和保持语义一致性方面的局限性,我们提出了一个由多模态大语言模型(Multimodal Large Language Model, MLLM)驱动的无需训练的图像编辑系统。该系统通过目标定位和语义推理,将复杂的多目标指令解析成顺序的子任务,并进行迭代编辑。此外,为了自适应地选择最合适的编辑模型,我们构建了评估数据集InferDataset,对对象移除、对象替换和局部编辑三种类型的编辑模型进行评估。基于综合评分机制,我们针对不同的编辑类型构建了二叉搜索树(BSTs),以方便模型调度。实验表明,在保持语义一致性和视觉质量的同时,InferEdit在处理复杂指令方面优于现有方法。
{"title":"InferEdit: An instruction-based system with a multimodal LLM for complex multi-target image editing","authors":"Zhiyong Huang,&nbsp;Yali She,&nbsp;MengLi Xiang,&nbsp;TuoJun Ding","doi":"10.1016/j.visinf.2025.100265","DOIUrl":"10.1016/j.visinf.2025.100265","url":null,"abstract":"<div><div>To address the limitations of existing instruction-based image editing methods in handling complex Multi-target instructions and maintaining semantic consistency, we present InferEdit, a training-free image editing system driven by a Multimodal Large Language Model (MLLM). The system parses complex multi-target instructions into sequential subtasks and performs editing iteratively through target localization and semantic reasoning. Furthermore, to adaptively select the most suitable editing models, we construct the evaluation dataset InferDataset to evaluate various editing models on three types of tasks: object removal, object replacement, and local editing. Based on a comprehensive scoring mechanism, we build Binary Search Trees (BSTs) for different editing types to facilitate model scheduling. Experiments demonstrate that InferEdit outperforms existing methods in handling complex instructions while maintaining semantic consistency and visual quality.</div></div>","PeriodicalId":36903,"journal":{"name":"Visual Informatics","volume":"9 3","pages":"Article 100265"},"PeriodicalIF":3.8,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interactive simulation and visual analysis of social media event dynamics with LLM-based multi-agent modeling 基于法学硕士的多智能体建模的社交媒体事件动态交互仿真和可视化分析
IF 3.8 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-09-01 DOI: 10.1016/j.visinf.2025.100260
Zichen Cheng , Ziyue Lin , Yihang Yang , Zhongyu Wei , Siming Chen
With the increasing role of social media in information dissemination, effectively simulating and analyzing public event dynamics has become a key research focus. We present an interactive visual analysis system for simulating social media events using multi-agent models powered by large language models (LLMs). By modeling agents with diverse characteristics, the system explores how agents perceive information, adjust their emotions and stances, provide feedback, and influence the trajectory of events. The system integrates real-time interactive simulation with multi-perspective visualization, enabling users to investigate event trajectories and key influencing factors under varied configurations. Theoretical work standardizes agent attributes and interaction mechanisms, supporting realistic simulation of social media behaviors. Evaluation through indicators and case studies demonstrates the system’s effectiveness and adaptability, offering a novel tool for public event analysis across open social platforms.
随着社交媒体在信息传播中的作用日益增强,有效地模拟和分析公共事件动态已成为一个重要的研究热点。我们提出了一个交互式可视化分析系统,用于使用由大型语言模型(llm)支持的多代理模型模拟社交媒体事件。该系统通过对具有不同特征的智能体进行建模,探索智能体如何感知信息、调整情绪和立场、提供反馈以及影响事件的发展轨迹。该系统将实时交互仿真与多视角可视化相结合,使用户能够研究不同配置下的事件轨迹和关键影响因素。理论工作规范了代理属性和交互机制,支持对社交媒体行为的真实模拟。通过指标和案例研究进行评估,证明了该系统的有效性和适应性,为跨开放社交平台的公共事件分析提供了一种新颖的工具。
{"title":"Interactive simulation and visual analysis of social media event dynamics with LLM-based multi-agent modeling","authors":"Zichen Cheng ,&nbsp;Ziyue Lin ,&nbsp;Yihang Yang ,&nbsp;Zhongyu Wei ,&nbsp;Siming Chen","doi":"10.1016/j.visinf.2025.100260","DOIUrl":"10.1016/j.visinf.2025.100260","url":null,"abstract":"<div><div>With the increasing role of social media in information dissemination, effectively simulating and analyzing public event dynamics has become a key research focus. We present an interactive visual analysis system for simulating social media events using multi-agent models powered by large language models (LLMs). By modeling agents with diverse characteristics, the system explores how agents perceive information, adjust their emotions and stances, provide feedback, and influence the trajectory of events. The system integrates real-time interactive simulation with multi-perspective visualization, enabling users to investigate event trajectories and key influencing factors under varied configurations. Theoretical work standardizes agent attributes and interaction mechanisms, supporting realistic simulation of social media behaviors. Evaluation through indicators and case studies demonstrates the system’s effectiveness and adaptability, offering a novel tool for public event analysis across open social platforms.</div></div>","PeriodicalId":36903,"journal":{"name":"Visual Informatics","volume":"9 3","pages":"Article 100260"},"PeriodicalIF":3.8,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145096885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CineFolio: Cinematography-guided camera planning for immersive narrative visualization CineFolio:沉浸式叙事可视化的电影指导相机规划
IF 3.8 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-09-01 DOI: 10.1016/j.visinf.2025.100259
Zhan Wang , Qian Zhu , David Yip , Fugee Tsung , Wei Zeng
Narrative visualization facilitates data presentation and communicates insights, while virtual reality can further enhance immersive and engaging experiences. The combination of these two research interests shows the potential to revolutionize the way data is presented and understood. Within the realm of narrative visualization, empirical evidence has particularly highlighted the importance of camera planning. However, existing works primarily rely on user-intensive manipulation of the camera, with little effort put into automating the process. To fill the gap, this paper proposes CineFolio, a semi-automated camera planning method to reduce manual effort and enhance user experience in immersive narrative visualization. CineFolio combines cinematic theories with graphics criteria, considering both information delivery and aesthetic enjoyment to ensure a comfortable and engaging experience. Specifically, we parametrize the considerations into optimizable camera properties and solve it as a constraint satisfaction problem (CSP) to realize common camera types for narrative visualization, namely overview camera for absorbing the scale, focus camera for detailed views, moving camera for animated transitions, and user-controlled camera allowing users to provide inputs to camera planning. We demonstrate the feasibility of our approach with cases of various data and chart types. To further evaluate our approach, we conducted a within-subject user study, comparing our automated method with manual camera control, and the results confirm both effectiveness of the guided navigation and expressiveness of the cinematic design for narrative visualization.
叙事可视化有助于数据呈现和交流见解,而虚拟现实可以进一步增强身临其境和引人入胜的体验。这两个研究兴趣的结合显示了数据呈现和理解方式的革命性潜力。在叙事可视化领域,经验证据特别强调了相机规划的重要性。然而,现有的作品主要依赖于用户对相机的密集操作,很少在自动化过程中投入精力。为了填补这一空白,本文提出了CineFolio,一种半自动化的摄像机规划方法,以减少人工工作量,增强沉浸式叙事可视化中的用户体验。CineFolio将电影理论与图形标准相结合,考虑到信息传递和审美享受,以确保舒适和引人入胜的体验。具体来说,我们将这些考虑参数化为可优化的摄像机属性,并将其作为约束满足问题(CSP)来解决,以实现用于叙事可视化的常见摄像机类型,即用于吸收尺度的全景摄像机,用于详细视图的焦点摄像机,用于动画过渡的移动摄像机,以及允许用户为摄像机规划提供输入的用户控制摄像机。我们用各种数据和图表类型的案例来证明我们的方法的可行性。为了进一步评估我们的方法,我们进行了一项主题内用户研究,将我们的自动化方法与手动相机控制方法进行了比较,结果证实了引导导航的有效性和电影设计对叙事可视化的表现力。
{"title":"CineFolio: Cinematography-guided camera planning for immersive narrative visualization","authors":"Zhan Wang ,&nbsp;Qian Zhu ,&nbsp;David Yip ,&nbsp;Fugee Tsung ,&nbsp;Wei Zeng","doi":"10.1016/j.visinf.2025.100259","DOIUrl":"10.1016/j.visinf.2025.100259","url":null,"abstract":"<div><div>Narrative visualization facilitates data presentation and communicates insights, while virtual reality can further enhance immersive and engaging experiences. The combination of these two research interests shows the potential to revolutionize the way data is presented and understood. Within the realm of narrative visualization, empirical evidence has particularly highlighted the importance of camera planning. However, existing works primarily rely on user-intensive manipulation of the camera, with little effort put into automating the process. To fill the gap, this paper proposes <em>CineFolio</em>, a semi-automated camera planning method to reduce manual effort and enhance user experience in immersive narrative visualization. <em>CineFolio</em> combines cinematic theories with graphics criteria, considering both information delivery and aesthetic enjoyment to ensure a comfortable and engaging experience. Specifically, we parametrize the considerations into optimizable camera properties and solve it as a constraint satisfaction problem (CSP) to realize common camera types for narrative visualization, namely <em>overview camera</em> for absorbing the scale, <em>focus camera</em> for detailed views, <em>moving camera</em> for animated transitions, and <em>user-controlled camera</em> allowing users to provide inputs to camera planning. We demonstrate the feasibility of our approach with cases of various data and chart types. To further evaluate our approach, we conducted a within-subject user study, comparing our automated method with manual camera control, and the results confirm both effectiveness of the guided navigation and expressiveness of the cinematic design for narrative visualization.</div></div>","PeriodicalId":36903,"journal":{"name":"Visual Informatics","volume":"9 3","pages":"Article 100259"},"PeriodicalIF":3.8,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145096887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
T-Foresight: Interpret moving strategies based on context-aware trajectory prediction T-Foresight:基于上下文感知轨迹预测来解释移动策略
IF 3.8 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-09-01 DOI: 10.1016/j.visinf.2025.100261
Yueqiao Chen , Jiang Wu , Yingcai Wu , Dongyu Liu
Trajectory prediction and interpretation are crucial in various domains for optimizing movements in complex environments. However, understanding how diverse contextual factors—environmental, physical, and social—influence moving strategies is challenging due to their multifaceted nature, which complicates quantification and the derivation of actionable insights. We introduce an interpretable analytics workflow that addresses these challenges by innovatively leveraging ensemble learning for context-aware trajectory prediction. Multiple base predictors simulate diverse moving strategies, while a decision-making model assesses the suitability of each predictor in specific contexts. This approach quantifies the impact of contextual factors by interpreting the decision-making model’s predictions and reveals possible moving strategies through the aggregation of base predictors’ outputs. The workflow comes with T-Foresight, an interactive visualization interface that empowers stakeholders to explore predictions, interpret contextual influences, and devise and compare moving strategies effectively. We evaluate our approach in the domain of eSports, specifically MOBA games. Through case studies with professional analysts, we demonstrate T-Foresight’s effectiveness in illustrating player moving strategies and providing insights into top-tier tactics. A user study further confirms its usefulness in helping average players uncover and understand advanced strategies.
轨迹预测和解释对于优化复杂环境中的运动是至关重要的。然而,了解环境、物理和社会等不同背景因素对移动策略的影响是具有挑战性的,因为它们具有多方面的性质,这使得量化和可操作见解的推导变得复杂。我们引入了一个可解释的分析工作流程,通过创新地利用集成学习进行上下文感知轨迹预测来解决这些挑战。多个基本预测器模拟不同的移动策略,而决策模型评估每个预测器在特定环境中的适用性。这种方法通过解释决策模型的预测来量化环境因素的影响,并通过汇总基本预测器的输出来揭示可能的移动策略。T-Foresight是一个交互式可视化界面,它使利益相关者能够探索预测,解释上下文影响,并有效地设计和比较移动策略。我们在电子竞技领域评估我们的方法,特别是MOBA游戏。通过与专业分析师的案例研究,我们展示了T-Foresight在说明玩家移动策略和提供顶级战术见解方面的有效性。一项用户研究进一步证实了它在帮助普通玩家发现和理解高级策略方面的有效性。
{"title":"T-Foresight: Interpret moving strategies based on context-aware trajectory prediction","authors":"Yueqiao Chen ,&nbsp;Jiang Wu ,&nbsp;Yingcai Wu ,&nbsp;Dongyu Liu","doi":"10.1016/j.visinf.2025.100261","DOIUrl":"10.1016/j.visinf.2025.100261","url":null,"abstract":"<div><div>Trajectory prediction and interpretation are crucial in various domains for optimizing movements in complex environments. However, understanding how diverse contextual factors—environmental, physical, and social—influence moving strategies is challenging due to their multifaceted nature, which complicates quantification and the derivation of actionable insights. We introduce an interpretable analytics workflow that addresses these challenges by innovatively leveraging ensemble learning for context-aware trajectory prediction. Multiple base predictors simulate diverse moving strategies, while a decision-making model assesses the suitability of each predictor in specific contexts. This approach quantifies the impact of contextual factors by interpreting the decision-making model’s predictions and reveals possible moving strategies through the aggregation of base predictors’ outputs. The workflow comes with T-Foresight, an interactive visualization interface that empowers stakeholders to explore predictions, interpret contextual influences, and devise and compare moving strategies effectively. We evaluate our approach in the domain of eSports, specifically MOBA games. Through case studies with professional analysts, we demonstrate T-Foresight’s effectiveness in illustrating player moving strategies and providing insights into top-tier tactics. A user study further confirms its usefulness in helping average players uncover and understand advanced strategies.</div></div>","PeriodicalId":36903,"journal":{"name":"Visual Informatics","volume":"9 3","pages":"Article 100261"},"PeriodicalIF":3.8,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145096890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hamiltonian cycle clustering with asymmetric correlation 具有不对称相关的哈密顿循环聚类
IF 3.8 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-08-22 DOI: 10.1016/j.visinf.2025.100263
Tianyi Huang , Zhengjun Zhang , Shenghui Cheng
Analysts who explore high-dimensional data usually want three answers at once: Which samples belong together, how close the resulting groups are, and who influences whom accordingly. Classical clustering provides only hard labels, hiding both inter-cluster affinities and correlation flow. We introduce Hamiltonian Cycle Clustering with Asymmetric Correlation HCC-AC, a framework that converts the clustering task into an interpretable map where structure and directionality are visible at a single glance. HCC-AC first learns soft memberships by optimizing a joint global–local loss, preserving manifold structure while turning each label into a probability. These probabilities drive a Hamiltonian-cycle embedding: cluster anchors are ordered by affinity and placed evenly on a circle; samples fall radially towards their most-likely anchor, so clusters, their similarities (arc lengths), and outliers emerge immediately. Directed arrows connect anchors, their lengths showing correlation strength, transforming the map into a legible narrative of influence. Experiments on five benchmark datasets demonstrate that HCC-AC improves the knowledge discovery in clustering, i.e., indexes the clustering results, flags outliers reliably, and uncovers correlation pathways.
研究高维数据的分析师通常希望同时得到三个答案:哪些样本属于同一组,结果组之间的距离有多近,以及谁会相应地影响谁。经典聚类只提供硬标签,隐藏了簇间的亲和关系和关联流。我们引入了hamilton Cycle Clustering with Asymmetric Correlation hc - ac,这是一个将聚类任务转换为结构和方向性一眼可见的可解释映射的框架。hc - ac首先通过优化联合全局-局部损失来学习软隶属度,在保留流形结构的同时将每个标签转化为概率。这些概率驱动了哈密顿循环嵌入:聚类锚按亲和度排序,并均匀地放置在一个圆上;样本呈放射状向其最可能的锚点下落,因此集群、它们的相似性(弧长)和异常值立即出现。定向箭头连接锚点,它们的长度显示相关性强度,将地图转换成清晰的影响力叙事。在五个基准数据集上的实验表明,HCC-AC提高了聚类中的知识发现,即对聚类结果进行索引,可靠地标记异常值,揭示相关途径。
{"title":"Hamiltonian cycle clustering with asymmetric correlation","authors":"Tianyi Huang ,&nbsp;Zhengjun Zhang ,&nbsp;Shenghui Cheng","doi":"10.1016/j.visinf.2025.100263","DOIUrl":"10.1016/j.visinf.2025.100263","url":null,"abstract":"<div><div>Analysts who explore high-dimensional data usually want three answers at once: Which samples belong together, how close the resulting groups are, and who influences whom accordingly. Classical clustering provides only hard labels, hiding both inter-cluster affinities and correlation flow. We introduce Hamiltonian Cycle Clustering with Asymmetric Correlation HCC-AC, a framework that converts the clustering task into an interpretable map where structure and directionality are visible at a single glance. HCC-AC first learns soft memberships by optimizing a joint global–local loss, preserving manifold structure while turning each label into a probability. These probabilities drive a Hamiltonian-cycle embedding: cluster anchors are ordered by affinity and placed evenly on a circle; samples fall radially towards their most-likely anchor, so clusters, their similarities (arc lengths), and outliers emerge immediately. Directed arrows connect anchors, their lengths showing correlation strength, transforming the map into a legible narrative of influence. Experiments on five benchmark datasets demonstrate that HCC-AC improves the knowledge discovery in clustering, i.e., indexes the clustering results, flags outliers reliably, and uncovers correlation pathways.</div></div>","PeriodicalId":36903,"journal":{"name":"Visual Informatics","volume":"9 4","pages":"Article 100263"},"PeriodicalIF":3.8,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145420023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-similarity guided regression with contrast enhancement for spine segmentation 基于对比度增强的自相似引导回归脊柱分割
IF 3.8 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-08-22 DOI: 10.1016/j.visinf.2025.100268
Xiaojia Zhu , Chunyu Li , Rui Chen , Zhiwen Shao
Accurate spine segmentation is critical for scoliosis diagnosis and treatment. For instance, automatic Cobb angle measurement for scoliosis relies on precisely localized vertebral masks. However, it remains a challenging task due to low tissue contrast, blurred vertebral edges, and overlapping anatomical structures. In this paper, we propose SRNet, a pure segmentation network that produces binary masks of each vertebra. SRNet integrates two novel components, a Self-similarity Guided Dynamic Convolution (SGDC) module and a Contrast-Enhanced Boundary Decoder (CEBD). SGDC exploits the repetitive structure of vertebrae by leveraging non-local attention to compute self-similarity across feature maps and dynamic convolution to combine multiple convolution kernels adaptively. CEBD sharpens segmentation boundaries via a reverse-attention mechanism that erases the coarse prediction and focuses on missing edge details, combined with a spectral-residual filter that amplifies high-frequency edge information. Extensive experiments on the AASCE spine X-ray dataset show that our SRNet achieves a high Dice score of 92.37%, outperforming state-of-the-art approaches. While our primary focus here is mask segmentation, the accurate vertebral masks produced by SRNet could readily support future tasks such as scoliosis Cobb angle estimation.
准确的脊柱分割对脊柱侧凸的诊断和治疗至关重要。例如,脊柱侧弯的自动科布角测量依赖于精确定位的椎体面罩。然而,由于低组织对比度、模糊的椎体边缘和重叠的解剖结构,这仍然是一项具有挑战性的任务。在本文中,我们提出了SRNet,一个纯分割网络,产生每个椎体的二进制掩码。SRNet集成了两个新组件,一个自相似引导动态卷积(SGDC)模块和一个对比度增强边界解码器(CEBD)。SGDC利用椎骨的重复结构,利用非局部注意计算特征映射之间的自相似度,利用动态卷积自适应地组合多个卷积核。CEBD通过一种反向注意机制来锐化分割边界,该机制可以消除粗糙的预测,并专注于缺失的边缘细节,结合一个频谱残留滤波器来放大高频边缘信息。在AASCE脊柱x射线数据集上的大量实验表明,我们的SRNet达到了92.37%的高Dice得分,优于最先进的方法。虽然我们在这里主要关注的是掩模分割,但SRNet产生的准确的椎体掩模可以很容易地支持未来的任务,如脊柱侧凸Cobb角估计。
{"title":"Self-similarity guided regression with contrast enhancement for spine segmentation","authors":"Xiaojia Zhu ,&nbsp;Chunyu Li ,&nbsp;Rui Chen ,&nbsp;Zhiwen Shao","doi":"10.1016/j.visinf.2025.100268","DOIUrl":"10.1016/j.visinf.2025.100268","url":null,"abstract":"<div><div>Accurate spine segmentation is critical for scoliosis diagnosis and treatment. For instance, automatic Cobb angle measurement for scoliosis relies on precisely localized vertebral masks. However, it remains a challenging task due to low tissue contrast, blurred vertebral edges, and overlapping anatomical structures. In this paper, we propose SRNet, a pure segmentation network that produces binary masks of each vertebra. SRNet integrates two novel components, a Self-similarity Guided Dynamic Convolution (SGDC) module and a Contrast-Enhanced Boundary Decoder (CEBD). SGDC exploits the repetitive structure of vertebrae by leveraging non-local attention to compute self-similarity across feature maps and dynamic convolution to combine multiple convolution kernels adaptively. CEBD sharpens segmentation boundaries via a reverse-attention mechanism that erases the coarse prediction and focuses on missing edge details, combined with a spectral-residual filter that amplifies high-frequency edge information. Extensive experiments on the AASCE spine X-ray dataset show that our SRNet achieves a high Dice score of 92.37%, outperforming state-of-the-art approaches. While our primary focus here is mask segmentation, the accurate vertebral masks produced by SRNet could readily support future tasks such as scoliosis Cobb angle estimation.</div></div>","PeriodicalId":36903,"journal":{"name":"Visual Informatics","volume":"9 4","pages":"Article 100268"},"PeriodicalIF":3.8,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145568875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sequential pattern recognition in CAD operations: A deep learning framework for next-action prediction CAD操作中的顺序模式识别:下一步行动预测的深度学习框架
IF 3.8 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-08-21 DOI: 10.1016/j.visinf.2025.100266
Teerapord Lin, Paisit Khanarsa
Computer-Aided Design (CAD) systems have become essential tools in engineering and design fields. However, the complexity of these systems can create a steep learning curve and reduce efficiency for users. To address this challenge, a deep learning-based approach for predicting the next CAD command in a design sequence is proposed, leveraging sentence embeddings and sequential pattern recognition to enhance prediction accuracy. The method utilizes the Multilingual Universal Sentence Encoder (MUSE) to generate dense vector representations of CAD commands, effectively capturing semantic relationships between different design operations. These embeddings are then combined with distance features that encode the sequential patterns between consecutive commands to create comprehensive representation of design workflows. Two neural architectures are implemented and evaluated: the Convolutional Sequence Embedding Recommendation (Caser) model and the Tiny-Transformer model, each tested with four different feature configurations (random embeddings, random embeddings with distance features, MUSE embeddings, and MUSE embeddings with distance features), resulting in eight model variants total. Experimental results demonstrate that the CNN-based Caser model consistently outperforms the attention-based Tiny-Transformer across all configurations. The best performing model, Caser with MUSE embeddings and distance features, achieves the highest accuracy at 0.5902 and precision at 0.5917, representing a 7.6% improvement over traditional methods and a 2.5% improvement over the best Transformer variant. Our analysis of the training dynamics reveals that models with distance features converge faster and demonstrate more stable validation loss, highlighting the complementary roles of semantic understanding and sequential pattern recognition in CAD command prediction. However, while Transformer models showed competitive baseline performance, they failed to benefit from additional feature engineering, unlike the Caser models which effectively leveraged both semantic and sequential information. These findings show that incorporating both semantic understanding of commands and their sequential relationships significantly improves prediction accuracy, potentially enhancing user experience by providing intelligent command suggestions during the CAD design process.
计算机辅助设计(CAD)系统已经成为工程和设计领域必不可少的工具。然而,这些系统的复杂性可能会产生陡峭的学习曲线,并降低用户的效率。为了解决这一挑战,提出了一种基于深度学习的方法来预测设计序列中的下一个CAD命令,利用句子嵌入和顺序模式识别来提高预测精度。该方法利用多语言通用句子编码器(MUSE)生成CAD命令的密集向量表示,有效捕获不同设计操作之间的语义关系。然后,这些嵌入与编码连续命令之间的顺序模式的距离特征相结合,以创建设计工作流的综合表示。实现并评估了两种神经结构:卷积序列嵌入推荐(Caser)模型和微型变压器模型,每种模型都用四种不同的特征配置(随机嵌入、带距离特征的随机嵌入、MUSE嵌入和带距离特征的MUSE嵌入)进行了测试,总共产生了8种模型变体。实验结果表明,基于cnn的Caser模型在所有配置下都优于基于注意力的Tiny-Transformer模型。表现最好的模型Caser具有MUSE嵌入和距离特征,达到了0.5902的最高精度和0.5917的精度,比传统方法提高了7.6%,比最好的Transformer变体提高了2.5%。我们对训练动力学的分析表明,具有距离特征的模型收敛速度更快,验证损失更稳定,突出了语义理解和顺序模式识别在CAD命令预测中的互补作用。然而,当Transformer模型显示出具有竞争力的基线性能时,它们没有从额外的特征工程中获益,这与Caser模型不同,后者有效地利用了语义和顺序信息。这些发现表明,结合命令的语义理解及其顺序关系可以显著提高预测准确性,并可能通过在CAD设计过程中提供智能命令建议来增强用户体验。
{"title":"Sequential pattern recognition in CAD operations: A deep learning framework for next-action prediction","authors":"Teerapord Lin,&nbsp;Paisit Khanarsa","doi":"10.1016/j.visinf.2025.100266","DOIUrl":"10.1016/j.visinf.2025.100266","url":null,"abstract":"<div><div>Computer-Aided Design (CAD) systems have become essential tools in engineering and design fields. However, the complexity of these systems can create a steep learning curve and reduce efficiency for users. To address this challenge, a deep learning-based approach for predicting the next CAD command in a design sequence is proposed, leveraging sentence embeddings and sequential pattern recognition to enhance prediction accuracy. The method utilizes the Multilingual Universal Sentence Encoder (MUSE) to generate dense vector representations of CAD commands, effectively capturing semantic relationships between different design operations. These embeddings are then combined with distance features that encode the sequential patterns between consecutive commands to create comprehensive representation of design workflows. Two neural architectures are implemented and evaluated: the Convolutional Sequence Embedding Recommendation (Caser) model and the Tiny-Transformer model, each tested with four different feature configurations (random embeddings, random embeddings with distance features, MUSE embeddings, and MUSE embeddings with distance features), resulting in eight model variants total. Experimental results demonstrate that the CNN-based Caser model consistently outperforms the attention-based Tiny-Transformer across all configurations. The best performing model, Caser with MUSE embeddings and distance features, achieves the highest accuracy at 0.5902 and precision at 0.5917, representing a 7.6% improvement over traditional methods and a 2.5% improvement over the best Transformer variant. Our analysis of the training dynamics reveals that models with distance features converge faster and demonstrate more stable validation loss, highlighting the complementary roles of semantic understanding and sequential pattern recognition in CAD command prediction. However, while Transformer models showed competitive baseline performance, they failed to benefit from additional feature engineering, unlike the Caser models which effectively leveraged both semantic and sequential information. These findings show that incorporating both semantic understanding of commands and their sequential relationships significantly improves prediction accuracy, potentially enhancing user experience by providing intelligent command suggestions during the CAD design process.</div></div>","PeriodicalId":36903,"journal":{"name":"Visual Informatics","volume":"9 4","pages":"Article 100266"},"PeriodicalIF":3.8,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145420021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HuGe: Towards Human-controllable image Generation in autonomous driving HuGe:面向自动驾驶中人类可控的图像生成
IF 3.8 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-08-09 DOI: 10.1016/j.visinf.2025.100262
Yuanzhi Zeng , Shiwei Chen , Yutian Zhang , Dong Sun , Yong Wang , Haipeng Zeng
The rapid advancement of autonomous driving technology has reshaped the automotive industry, highlighting the need for diverse and high-quality image data. Existing image datasets for training and improving autonomous driving technologies lack rare scenarios like extreme weather, limiting the effectiveness and reliability of autonomous driving technologies. One possible way of expanding the dataset coverage is to augment the existing dataset with artificial ones, which, however, still suffers from various challenges like limited controllability and unclear corner case boundaries. To address these challenges, we design and develop an interactive visual analysis system, HuGe, to achieve efficient and semi-automatic controllable image generation. HuGe incorporates weather transformation models and a novel semi-automatic knowledge-based controllable object insertion method which leverages the controllability of convex optimization and the variability of diffusion models. We formulate the design requirements, propose an effective framework, and design four coordinated views to support controllable image generation, multidimensional dataset analysis, and evaluation of the generated samples. Two case studies, a metric-based evaluation and interviews with domain experts demonstrate the practicality and effectiveness of HuGe in controllable image generation for autonomous driving.
自动驾驶技术的快速发展重塑了汽车行业,凸显了对多样化和高质量图像数据的需求。现有用于训练和改进自动驾驶技术的图像数据集缺乏极端天气等罕见场景,限制了自动驾驶技术的有效性和可靠性。扩大数据集覆盖范围的一种可能方法是用人工数据集增强现有数据集,然而,这种方法仍然面临着可控性有限、边角情况边界不明确等各种挑战。为了应对这些挑战,我们设计并开发了一个交互式视觉分析系统HuGe,以实现高效和半自动的可控图像生成。HuGe结合了天气变化模型和一种新的基于知识的半自动可控对象插入方法,该方法利用了凸优化的可控性和扩散模型的可变性。我们制定了设计要求,提出了一个有效的框架,并设计了四个协调的视图,以支持可控的图像生成、多维数据集分析和生成样本的评估。两个案例研究、基于度量的评估和对领域专家的访谈证明了HuGe在自动驾驶可控图像生成方面的实用性和有效性。
{"title":"HuGe: Towards Human-controllable image Generation in autonomous driving","authors":"Yuanzhi Zeng ,&nbsp;Shiwei Chen ,&nbsp;Yutian Zhang ,&nbsp;Dong Sun ,&nbsp;Yong Wang ,&nbsp;Haipeng Zeng","doi":"10.1016/j.visinf.2025.100262","DOIUrl":"10.1016/j.visinf.2025.100262","url":null,"abstract":"<div><div>The rapid advancement of autonomous driving technology has reshaped the automotive industry, highlighting the need for diverse and high-quality image data. Existing image datasets for training and improving autonomous driving technologies lack rare scenarios like extreme weather, limiting the effectiveness and reliability of autonomous driving technologies. One possible way of expanding the dataset coverage is to augment the existing dataset with artificial ones, which, however, still suffers from various challenges like limited controllability and unclear corner case boundaries. To address these challenges, we design and develop an interactive visual analysis system, <em>HuGe</em>, to achieve efficient and semi-automatic controllable image generation. <em>HuGe</em> incorporates weather transformation models and a novel semi-automatic knowledge-based controllable object insertion method which leverages the controllability of convex optimization and the variability of diffusion models. We formulate the design requirements, propose an effective framework, and design four coordinated views to support controllable image generation, multidimensional dataset analysis, and evaluation of the generated samples. Two case studies, a metric-based evaluation and interviews with domain experts demonstrate the practicality and effectiveness of <em>HuGe</em> in controllable image generation for autonomous driving.</div></div>","PeriodicalId":36903,"journal":{"name":"Visual Informatics","volume":"9 4","pages":"Article 100262"},"PeriodicalIF":3.8,"publicationDate":"2025-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145340531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Visual Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1