Improving the t-SNE Algorithms for Cytometry and Other Technologies: Cen-Se' Mapping

Journal of biometrics & biostatistics Pub Date : 2019-05-20 DOI:10.4172/2155-6180.1000430

C. B. Bagwell, C. Bray, D. Herbert, Beth L. Hill, M. Inokuma, Gregory T. Stelzer, B. Hunsberger

{"title":"Improving the t-SNE Algorithms for Cytometry and Other Technologies: Cen-Se' Mapping","authors":"C. B. Bagwell, C. Bray, D. Herbert, Beth L. Hill, M. Inokuma, Gregory T. Stelzer, B. Hunsberger","doi":"10.4172/2155-6180.1000430","DOIUrl":null,"url":null,"abstract":"SNE methods are a set of 9 to 10 interconnected algorithms that map high-dimensional data into low-dimensional space while minimizing loss of information. Each step in this process is important for producing high-quality maps. Cense′™ mapping not only enhances many of the steps in this process but also fundamentally changes the underlying mathematics to produce high-quality maps. The key mathematical enhancement is to leverage the Cauchy distribution for creating both high-dimensional and lowdimensional similarity matrices. This simple change eliminates the necessity of using perplexity and entropy and results in maps that optimally separate clusters defined in high-dimensional space. It also eliminates the loss of cluster resolution commonly seen with t-SNE with higher numbers of events. There is just one free parameter for Cen-se′ mapping, and that parameter rarely needs to change. Other enhancements include a relatively low memory footprint, highly threaded implementation, and a final classification step that can process millions of events in seconds. When the Cen-se′ mapping system is integrated with probability state modeling, the clusters of events are positioned in a reproducible manner and are colored, labeled, and enumerated automatically. We provide a step-by-step, simple example that describes how the Cen-se′ method works and differs from the t-SNE method. We present data from several experiments to compare the two mapping strategies on high-dimensional mass cytometry data. We provide a section on information theory to explain how the steepest gradient equations were formulated and how they control the movement of the low-dimensional points as the system renders the map Since existing implementations of the t-SNE algorithm can easily be modified with many of these enhancements, this work should result in more effective use of this very exciting and far-reaching new technology.","PeriodicalId":87294,"journal":{"name":"Journal of biometrics & biostatistics","volume":"10 1","pages":"1-13"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of biometrics & biostatistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4172/2155-6180.1000430","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

SNE methods are a set of 9 to 10 interconnected algorithms that map high-dimensional data into low-dimensional space while minimizing loss of information. Each step in this process is important for producing high-quality maps. Cense′™ mapping not only enhances many of the steps in this process but also fundamentally changes the underlying mathematics to produce high-quality maps. The key mathematical enhancement is to leverage the Cauchy distribution for creating both high-dimensional and lowdimensional similarity matrices. This simple change eliminates the necessity of using perplexity and entropy and results in maps that optimally separate clusters defined in high-dimensional space. It also eliminates the loss of cluster resolution commonly seen with t-SNE with higher numbers of events. There is just one free parameter for Cen-se′ mapping, and that parameter rarely needs to change. Other enhancements include a relatively low memory footprint, highly threaded implementation, and a final classification step that can process millions of events in seconds. When the Cen-se′ mapping system is integrated with probability state modeling, the clusters of events are positioned in a reproducible manner and are colored, labeled, and enumerated automatically. We provide a step-by-step, simple example that describes how the Cen-se′ method works and differs from the t-SNE method. We present data from several experiments to compare the two mapping strategies on high-dimensional mass cytometry data. We provide a section on information theory to explain how the steepest gradient equations were formulated and how they control the movement of the low-dimensional points as the system renders the map Since existing implementations of the t-SNE algorithm can easily be modified with many of these enhancements, this work should result in more effective use of this very exciting and far-reaching new technology.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

改进细胞术中的t-SNE算法及其他技术:cense - Mapping

SNE方法是一组9到10个相互关联的算法，将高维数据映射到低维空间，同时最大限度地减少信息损失。这个过程中的每一步对于制作高质量的地图都很重要。Cense '™地图不仅增强了这个过程中的许多步骤，而且从根本上改变了基础数学，以产生高质量的地图。关键的数学增强是利用柯西分布来创建高维和低维相似性矩阵。这个简单的改变消除了使用困惑和熵的必要性，并产生了在高维空间中定义的最佳分离集群的映射。它还消除了具有较高事件数的t-SNE中常见的集群分辨率损失。cense映射只有一个自由参数，而且这个参数很少需要改变。其他增强包括相对较低的内存占用、高度线程化的实现，以及可以在几秒钟内处理数百万个事件的最后分类步骤。当cene -se映射系统与概率状态建模相结合时，事件集群以可重复的方式定位，并自动着色、标记和枚举。我们提供了一个循序渐进的简单示例，描述了cense方法的工作原理以及与t-SNE方法的不同之处。我们提出了几个实验的数据，以比较高维细胞术数据的两种定位策略。我们提供了一节信息理论来解释最陡梯度方程是如何形成的，以及它们如何在系统渲染地图时控制低维点的运动。由于t-SNE算法的现有实现可以很容易地通过许多这些增强进行修改，因此这项工作应该导致更有效地使用这项非常令人兴奋和影响深远的新技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of biometrics & biostatistics

自引率

0.00%

发文量