{"title":"Exploratory Data Analysis and Searching Cliques in Graphs","authors":"András Hubai, Sándor Szabó, Bogdán Zaválnij","doi":"10.3390/a17030112","DOIUrl":null,"url":null,"abstract":"The principal component analysis is a well-known and widely used technique to determine the essential dimension of a data set. Broadly speaking, it aims to find a low-dimensional linear manifold that retains a large part of the information contained in the original data set. It may be the case that one cannot approximate the entirety of the original data set using a single low-dimensional linear manifold even though large subsets of it are amenable to such approximations. For these cases we raise the related but different challenge (problem) of locating subsets of a high dimensional data set that are approximately 1-dimensional. Naturally, we are interested in the largest of such subsets. We propose a method for finding these 1-dimensional manifolds by finding cliques in a purpose-built auxiliary graph.","PeriodicalId":502609,"journal":{"name":"Algorithms","volume":"32 12","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithms","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/a17030112","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The principal component analysis is a well-known and widely used technique to determine the essential dimension of a data set. Broadly speaking, it aims to find a low-dimensional linear manifold that retains a large part of the information contained in the original data set. It may be the case that one cannot approximate the entirety of the original data set using a single low-dimensional linear manifold even though large subsets of it are amenable to such approximations. For these cases we raise the related but different challenge (problem) of locating subsets of a high dimensional data set that are approximately 1-dimensional. Naturally, we are interested in the largest of such subsets. We propose a method for finding these 1-dimensional manifolds by finding cliques in a purpose-built auxiliary graph.