Large Language Models for Transforming Categorical Data to Interpretable Feature Vectors.

IEEE transactions on visualization and computer graphics Pub Date : 2024-09-30 DOI:10.1109/TVCG.2024.3460652

Karim Huesmann, Lars Linsen

引用次数: 0

Abstract

When analyzing heterogeneous data comprising numerical and categorical attributes, it is common to treat the different data types separately or transform the categorical attributes to numerical ones. The transformation has the advantage of facilitating an integrated multi-variate analysis of all attributes. We propose a novel technique for transforming categorical data into interpretable numerical feature vectors using Large Language Models (LLMs). The LLMs are used to identify the categorical attributes' main characteristics and assign numerical values to these characteristics, thus generating a multi-dimensional feature vector. The transformation can be computed fully automatically, but due to the interpretability of the characteristics, it can also be adjusted intuitively by an end user. We provide a respective interactive tool that aims to validate and possibly improve the AI-generated outputs. Having transformed a categorical attribute, we propose novel methods for ordering and color-coding the categories based on the similarities of the feature vectors.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

将分类数据转换为可解释特征矢量的大型语言模型。

在分析包含数字和分类属性的异构数据时，通常会将不同类型的数据分开处理，或将分类属性转换为数字属性。这种转换的优点是便于对所有属性进行综合多变量分析。我们提出了一种使用大型语言模型（LLM）将分类数据转换为可解释的数字特征向量的新技术。大型语言模型用于识别分类属性的主要特征，并为这些特征赋予数值，从而生成多维特征向量。转换可以完全自动计算，但由于特征的可解释性，最终用户也可以直观地进行调整。我们提供了一个互动工具，旨在验证并改进人工智能生成的输出结果。在对分类属性进行转换后，我们提出了根据特征向量的相似性对类别进行排序和颜色编码的新方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on visualization and computer graphics

自引率

0.00%

发文量

期刊最新文献

2024 Reviewers List Errata to “DiffFit: Visually-Guided Differentiable Fitting of Molecule Structures to a Cryo-EM Map” The Census-Stub Graph Invariant Descriptor TimeLighting: Guided Exploration of 2D Temporal Network Projections Preface