使用t-SNE的新指南:可选默认值、超参数选择自动化和比较评估

IF 3.8 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Visual Informatics Pub Date : 2022-06-01 DOI:10.1016/j.visinf.2022.04.003
Robert Gove, Lucas Cadalzo, Nicholas Leiby, Jedediah M. Singer, Alexander Zaitzeff
{"title":"使用t-SNE的新指南:可选默认值、超参数选择自动化和比较评估","authors":"Robert Gove,&nbsp;Lucas Cadalzo,&nbsp;Nicholas Leiby,&nbsp;Jedediah M. Singer,&nbsp;Alexander Zaitzeff","doi":"10.1016/j.visinf.2022.04.003","DOIUrl":null,"url":null,"abstract":"<div><p>We present new guidelines for choosing hyperparameters for t-SNE and an evaluation comparing these guidelines to current ones. These guidelines include a proposed empirically optimum guideline derived from a t-SNE hyperparameter grid search over a large collection of data sets. We also introduce a new method to featurize data sets using graph-based metrics called scagnostics; we use these features to train a neural network that predicts optimal t-SNE hyperparameters for the respective data set. This neural network has the potential to simplify the use of t-SNE by removing guesswork about which hyperparameters will produce the best embedding. We evaluate and compare our neural network-derived and empirically optimum hyperparameters to several other t-SNE hyperparameter guidelines from the literature on 68 data sets. The hyperparameters predicted by our neural network yield embeddings with similar accuracy as the best current t-SNE guidelines. Using our empirically optimum hyperparameters is simpler than following previously published guidelines but yields more accurate embeddings, in some cases by a statistically significant margin. We find that the useful ranges for t-SNE hyperparameters are narrower and include smaller values than previously reported in the literature. Importantly, we also quantify the potential for future improvements in this area: using data from a grid search of t-SNE hyperparameters we find that an optimal selection method could improve embedding accuracy by up to two percentage points over the methods examined in this paper.</p></div>","PeriodicalId":36903,"journal":{"name":"Visual Informatics","volume":null,"pages":null},"PeriodicalIF":3.8000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2468502X22000201/pdfft?md5=d092541f65d22cc8dfb4e8ef46a1293b&pid=1-s2.0-S2468502X22000201-main.pdf","citationCount":"13","resultStr":"{\"title\":\"New guidance for using t-SNE: Alternative defaults, hyperparameter selection automation, and comparative evaluation\",\"authors\":\"Robert Gove,&nbsp;Lucas Cadalzo,&nbsp;Nicholas Leiby,&nbsp;Jedediah M. Singer,&nbsp;Alexander Zaitzeff\",\"doi\":\"10.1016/j.visinf.2022.04.003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>We present new guidelines for choosing hyperparameters for t-SNE and an evaluation comparing these guidelines to current ones. These guidelines include a proposed empirically optimum guideline derived from a t-SNE hyperparameter grid search over a large collection of data sets. We also introduce a new method to featurize data sets using graph-based metrics called scagnostics; we use these features to train a neural network that predicts optimal t-SNE hyperparameters for the respective data set. This neural network has the potential to simplify the use of t-SNE by removing guesswork about which hyperparameters will produce the best embedding. We evaluate and compare our neural network-derived and empirically optimum hyperparameters to several other t-SNE hyperparameter guidelines from the literature on 68 data sets. The hyperparameters predicted by our neural network yield embeddings with similar accuracy as the best current t-SNE guidelines. Using our empirically optimum hyperparameters is simpler than following previously published guidelines but yields more accurate embeddings, in some cases by a statistically significant margin. We find that the useful ranges for t-SNE hyperparameters are narrower and include smaller values than previously reported in the literature. Importantly, we also quantify the potential for future improvements in this area: using data from a grid search of t-SNE hyperparameters we find that an optimal selection method could improve embedding accuracy by up to two percentage points over the methods examined in this paper.</p></div>\",\"PeriodicalId\":36903,\"journal\":{\"name\":\"Visual Informatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2022-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2468502X22000201/pdfft?md5=d092541f65d22cc8dfb4e8ef46a1293b&pid=1-s2.0-S2468502X22000201-main.pdf\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Visual Informatics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2468502X22000201\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Visual Informatics","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468502X22000201","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 13

摘要

我们提出了选择t-SNE超参数的新指南,并将这些指南与当前指南进行了比较。这些指导方针包括一个从大量数据集上的t-SNE超参数网格搜索得出的经验优化指导方针。我们还引入了一种新方法,使用基于图的度量来描述数据集,称为scagnostics;我们使用这些特征来训练一个神经网络,该网络可以预测相应数据集的最佳t-SNE超参数。这个神经网络有可能通过消除对哪个超参数将产生最佳嵌入的猜测来简化t-SNE的使用。我们评估并比较了我们的神经网络衍生的和经验最优的超参数与来自68个数据集的文献中的其他几个t-SNE超参数指南。我们的神经网络预测的超参数产生的嵌入具有与当前最佳t-SNE指南相似的精度。使用我们的经验最优超参数比遵循先前发布的指南更简单,但产生更准确的嵌入,在某些情况下具有统计上显著的优势。我们发现t-SNE超参数的有用范围比以前文献报道的更窄,包括更小的值。重要的是,我们还量化了该领域未来改进的潜力:使用来自t-SNE超参数网格搜索的数据,我们发现最优选择方法可以将嵌入精度提高两个百分点,超过本文所研究的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
New guidance for using t-SNE: Alternative defaults, hyperparameter selection automation, and comparative evaluation

We present new guidelines for choosing hyperparameters for t-SNE and an evaluation comparing these guidelines to current ones. These guidelines include a proposed empirically optimum guideline derived from a t-SNE hyperparameter grid search over a large collection of data sets. We also introduce a new method to featurize data sets using graph-based metrics called scagnostics; we use these features to train a neural network that predicts optimal t-SNE hyperparameters for the respective data set. This neural network has the potential to simplify the use of t-SNE by removing guesswork about which hyperparameters will produce the best embedding. We evaluate and compare our neural network-derived and empirically optimum hyperparameters to several other t-SNE hyperparameter guidelines from the literature on 68 data sets. The hyperparameters predicted by our neural network yield embeddings with similar accuracy as the best current t-SNE guidelines. Using our empirically optimum hyperparameters is simpler than following previously published guidelines but yields more accurate embeddings, in some cases by a statistically significant margin. We find that the useful ranges for t-SNE hyperparameters are narrower and include smaller values than previously reported in the literature. Importantly, we also quantify the potential for future improvements in this area: using data from a grid search of t-SNE hyperparameters we find that an optimal selection method could improve embedding accuracy by up to two percentage points over the methods examined in this paper.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Visual Informatics
Visual Informatics Computer Science-Computer Graphics and Computer-Aided Design
CiteScore
6.70
自引率
3.30%
发文量
33
审稿时长
79 days
期刊最新文献
RelicCARD: Enhancing cultural relics exploration through semantics-based augmented reality tangible interaction design JobViz: Skill-driven visual exploration of job advertisements Visual evaluation of graph representation learning based on the presentation of community structures DPKnob: A visual analysis approach to risk-aware formulation of differential privacy schemes for data query scenarios Visual exploration of multi-dimensional data via rule-based sample embedding
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1