{"title":"Large Language Models for Transforming Categorical Data to Interpretable Feature Vectors.","authors":"Karim Huesmann, Lars Linsen","doi":"10.1109/TVCG.2024.3460652","DOIUrl":null,"url":null,"abstract":"<p><p>When analyzing heterogeneous data comprising numerical and categorical attributes, it is common to treat the different data types separately or transform the categorical attributes to numerical ones. The transformation has the advantage of facilitating an integrated multi-variate analysis of all attributes. We propose a novel technique for transforming categorical data into interpretable numerical feature vectors using Large Language Models (LLMs). The LLMs are used to identify the categorical attributes' main characteristics and assign numerical values to these characteristics, thus generating a multi-dimensional feature vector. The transformation can be computed fully automatically, but due to the interpretability of the characteristics, it can also be adjusted intuitively by an end user. We provide a respective interactive tool that aims to validate and possibly improve the AI-generated outputs. Having transformed a categorical attribute, we propose novel methods for ordering and color-coding the categories based on the similarities of the feature vectors.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on visualization and computer graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TVCG.2024.3460652","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
When analyzing heterogeneous data comprising numerical and categorical attributes, it is common to treat the different data types separately or transform the categorical attributes to numerical ones. The transformation has the advantage of facilitating an integrated multi-variate analysis of all attributes. We propose a novel technique for transforming categorical data into interpretable numerical feature vectors using Large Language Models (LLMs). The LLMs are used to identify the categorical attributes' main characteristics and assign numerical values to these characteristics, thus generating a multi-dimensional feature vector. The transformation can be computed fully automatically, but due to the interpretability of the characteristics, it can also be adjusted intuitively by an end user. We provide a respective interactive tool that aims to validate and possibly improve the AI-generated outputs. Having transformed a categorical attribute, we propose novel methods for ordering and color-coding the categories based on the similarities of the feature vectors.