Supporting Dynamic Quantization for High-Dimensional Data Analytics.

Gheorghi Guzun, Guadalupe Canahuate
{"title":"Supporting Dynamic Quantization for High-Dimensional Data Analytics.","authors":"Gheorghi Guzun,&nbsp;Guadalupe Canahuate","doi":"10.1145/3077331.3077336","DOIUrl":null,"url":null,"abstract":"<p><p>Similarity searches are at the heart of exploratory data analysis tasks. Distance metrics are typically used to characterize the similarity between data objects represented as feature vectors. However, when the dimensionality of the data increases and the number of features is large, traditional distance metrics fail to distinguish between the closest and furthest data points. Localized distance functions have been proposed as an alternative to traditional distance metrics. These functions only consider dimensions close to query to compute the distance/similarity. Furthermore, in order to enable interactive explorations of high-dimensional data, indexing support for ad-hoc queries is needed. In this work we set up to investigate whether bit-sliced indices can be used for exploratory analytics such as similarity searches and data clustering for high-dimensional big-data. We also propose a novel dynamic quantization called Query dependent Equi-Depth (QED) quantization and show its effectiveness on characterizing high-dimensional similarity. When applying QED we observe improvements in kNN classification accuracy over traditional distance functions.</p><p><strong>Acm reference format: </strong>Gheorghi Guzun and Guadalupe Canahuate. 2017. Supporting Dynamic Quantization for High-Dimensional Data Analytics. In Proceedings of Ex-ploreDB'17, Chicago, IL, USA, May 14-19, 2017, 6 pages. https://doi.org/http://dx.doi.org/10.1145/3077331.3077336.</p>","PeriodicalId":92430,"journal":{"name":"Proceedings of the ExploreDB'17. International Workshop on Exploratory Search in Databases and the Web (4th : 2017 : Chicago, Ill.)","volume":"2017 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3077331.3077336","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ExploreDB'17. International Workshop on Exploratory Search in Databases and the Web (4th : 2017 : Chicago, Ill.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3077331.3077336","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Similarity searches are at the heart of exploratory data analysis tasks. Distance metrics are typically used to characterize the similarity between data objects represented as feature vectors. However, when the dimensionality of the data increases and the number of features is large, traditional distance metrics fail to distinguish between the closest and furthest data points. Localized distance functions have been proposed as an alternative to traditional distance metrics. These functions only consider dimensions close to query to compute the distance/similarity. Furthermore, in order to enable interactive explorations of high-dimensional data, indexing support for ad-hoc queries is needed. In this work we set up to investigate whether bit-sliced indices can be used for exploratory analytics such as similarity searches and data clustering for high-dimensional big-data. We also propose a novel dynamic quantization called Query dependent Equi-Depth (QED) quantization and show its effectiveness on characterizing high-dimensional similarity. When applying QED we observe improvements in kNN classification accuracy over traditional distance functions.

Acm reference format: Gheorghi Guzun and Guadalupe Canahuate. 2017. Supporting Dynamic Quantization for High-Dimensional Data Analytics. In Proceedings of Ex-ploreDB'17, Chicago, IL, USA, May 14-19, 2017, 6 pages. https://doi.org/http://dx.doi.org/10.1145/3077331.3077336.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
支持高维数据分析的动态量化。
相似性搜索是探索性数据分析任务的核心。距离度量通常用于表示为特征向量的数据对象之间的相似性。然而,当数据的维数增加,特征数量很大时,传统的距离度量无法区分最近和最远的数据点。局部距离函数已被提出作为传统距离度量的替代方法。这些函数只考虑接近查询的维度来计算距离/相似度。此外,为了支持对高维数据的交互式探索,需要对特别查询提供索引支持。在这项工作中,我们开始研究位切片索引是否可以用于探索性分析,如相似性搜索和高维大数据的数据聚类。我们还提出了一种新的动态量化,称为查询相关等深度量化(QED),并证明了它在表征高维相似性方面的有效性。当应用QED时,我们观察到kNN分类精度比传统距离函数有所提高。Acm参考格式:georghi Guzun and Guadalupe canhuate . 2017。支持高维数据分析的动态量化。《Proceedings of Ex-ploreDB’17》,2017年5月14-19日,美国芝加哥,IL, USA, 6页。https://doi.org/http: / / dx.doi.org/10.1145/3077331.3077336。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Integration and Exploration of Connected Personal Digital Traces Enabling Change Exploration: Vision Paper Structural Query Expansion via motifs from Wikipedia Interactive Exploration of Correlated Time Series On Achieving Diversity in Recommender Systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1