改进细胞术中的t-SNE算法及其他技术:cense - Mapping

C. B. Bagwell, C. Bray, D. Herbert, Beth L. Hill, M. Inokuma, Gregory T. Stelzer, B. Hunsberger
{"title":"改进细胞术中的t-SNE算法及其他技术:cense - Mapping","authors":"C. B. Bagwell, C. Bray, D. Herbert, Beth L. Hill, M. Inokuma, Gregory T. Stelzer, B. Hunsberger","doi":"10.4172/2155-6180.1000430","DOIUrl":null,"url":null,"abstract":"SNE methods are a set of 9 to 10 interconnected algorithms that map high-dimensional data into low-dimensional space while minimizing loss of information. Each step in this process is important for producing high-quality maps. Cense′™ mapping not only enhances many of the steps in this process but also fundamentally changes the underlying mathematics to produce high-quality maps. The key mathematical enhancement is to leverage the Cauchy distribution for creating both high-dimensional and lowdimensional similarity matrices. This simple change eliminates the necessity of using perplexity and entropy and results in maps that optimally separate clusters defined in high-dimensional space. It also eliminates the loss of cluster resolution commonly seen with t-SNE with higher numbers of events. There is just one free parameter for Cen-se′ mapping, and that parameter rarely needs to change. Other enhancements include a relatively low memory footprint, highly threaded implementation, and a final classification step that can process millions of events in seconds. When the Cen-se′ mapping system is integrated with probability state modeling, the clusters of events are positioned in a reproducible manner and are colored, labeled, and enumerated automatically. We provide a step-by-step, simple example that describes how the Cen-se′ method works and differs from the t-SNE method. We present data from several experiments to compare the two mapping strategies on high-dimensional mass cytometry data. We provide a section on information theory to explain how the steepest gradient equations were formulated and how they control the movement of the low-dimensional points as the system renders the map Since existing implementations of the t-SNE algorithm can easily be modified with many of these enhancements, this work should result in more effective use of this very exciting and far-reaching new technology.","PeriodicalId":87294,"journal":{"name":"Journal of biometrics & biostatistics","volume":"10 1","pages":"1-13"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Improving the t-SNE Algorithms for Cytometry and Other Technologies: Cen-Se' Mapping\",\"authors\":\"C. B. Bagwell, C. Bray, D. Herbert, Beth L. Hill, M. Inokuma, Gregory T. Stelzer, B. Hunsberger\",\"doi\":\"10.4172/2155-6180.1000430\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"SNE methods are a set of 9 to 10 interconnected algorithms that map high-dimensional data into low-dimensional space while minimizing loss of information. Each step in this process is important for producing high-quality maps. Cense′™ mapping not only enhances many of the steps in this process but also fundamentally changes the underlying mathematics to produce high-quality maps. The key mathematical enhancement is to leverage the Cauchy distribution for creating both high-dimensional and lowdimensional similarity matrices. This simple change eliminates the necessity of using perplexity and entropy and results in maps that optimally separate clusters defined in high-dimensional space. It also eliminates the loss of cluster resolution commonly seen with t-SNE with higher numbers of events. There is just one free parameter for Cen-se′ mapping, and that parameter rarely needs to change. Other enhancements include a relatively low memory footprint, highly threaded implementation, and a final classification step that can process millions of events in seconds. When the Cen-se′ mapping system is integrated with probability state modeling, the clusters of events are positioned in a reproducible manner and are colored, labeled, and enumerated automatically. We provide a step-by-step, simple example that describes how the Cen-se′ method works and differs from the t-SNE method. We present data from several experiments to compare the two mapping strategies on high-dimensional mass cytometry data. We provide a section on information theory to explain how the steepest gradient equations were formulated and how they control the movement of the low-dimensional points as the system renders the map Since existing implementations of the t-SNE algorithm can easily be modified with many of these enhancements, this work should result in more effective use of this very exciting and far-reaching new technology.\",\"PeriodicalId\":87294,\"journal\":{\"name\":\"Journal of biometrics & biostatistics\",\"volume\":\"10 1\",\"pages\":\"1-13\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of biometrics & biostatistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4172/2155-6180.1000430\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of biometrics & biostatistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4172/2155-6180.1000430","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

SNE方法是一组9到10个相互关联的算法,将高维数据映射到低维空间,同时最大限度地减少信息损失。这个过程中的每一步对于制作高质量的地图都很重要。Cense '™地图不仅增强了这个过程中的许多步骤,而且从根本上改变了基础数学,以产生高质量的地图。关键的数学增强是利用柯西分布来创建高维和低维相似性矩阵。这个简单的改变消除了使用困惑和熵的必要性,并产生了在高维空间中定义的最佳分离集群的映射。它还消除了具有较高事件数的t-SNE中常见的集群分辨率损失。cense映射只有一个自由参数,而且这个参数很少需要改变。其他增强包括相对较低的内存占用、高度线程化的实现,以及可以在几秒钟内处理数百万个事件的最后分类步骤。当cene -se映射系统与概率状态建模相结合时,事件集群以可重复的方式定位,并自动着色、标记和枚举。我们提供了一个循序渐进的简单示例,描述了cense方法的工作原理以及与t-SNE方法的不同之处。我们提出了几个实验的数据,以比较高维细胞术数据的两种定位策略。我们提供了一节信息理论来解释最陡梯度方程是如何形成的,以及它们如何在系统渲染地图时控制低维点的运动。由于t-SNE算法的现有实现可以很容易地通过许多这些增强进行修改,因此这项工作应该导致更有效地使用这项非常令人兴奋和影响深远的新技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Improving the t-SNE Algorithms for Cytometry and Other Technologies: Cen-Se' Mapping
SNE methods are a set of 9 to 10 interconnected algorithms that map high-dimensional data into low-dimensional space while minimizing loss of information. Each step in this process is important for producing high-quality maps. Cense′™ mapping not only enhances many of the steps in this process but also fundamentally changes the underlying mathematics to produce high-quality maps. The key mathematical enhancement is to leverage the Cauchy distribution for creating both high-dimensional and lowdimensional similarity matrices. This simple change eliminates the necessity of using perplexity and entropy and results in maps that optimally separate clusters defined in high-dimensional space. It also eliminates the loss of cluster resolution commonly seen with t-SNE with higher numbers of events. There is just one free parameter for Cen-se′ mapping, and that parameter rarely needs to change. Other enhancements include a relatively low memory footprint, highly threaded implementation, and a final classification step that can process millions of events in seconds. When the Cen-se′ mapping system is integrated with probability state modeling, the clusters of events are positioned in a reproducible manner and are colored, labeled, and enumerated automatically. We provide a step-by-step, simple example that describes how the Cen-se′ method works and differs from the t-SNE method. We present data from several experiments to compare the two mapping strategies on high-dimensional mass cytometry data. We provide a section on information theory to explain how the steepest gradient equations were formulated and how they control the movement of the low-dimensional points as the system renders the map Since existing implementations of the t-SNE algorithm can easily be modified with many of these enhancements, this work should result in more effective use of this very exciting and far-reaching new technology.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
PROSPECTIVELY ESTIMATING THE AGE OF INITIATION OF E-CIGARETTES AMONG U.S. YOUTH: FINDINGS FROM THE POPULATION ASSESSMENT OF TOBACCO AND HEALTH (PATH) STUDY, 2013-2017. The Kumaraswamy-Rani Distribution and Its Applications Analytical Visual Methods to Describe Practice Patterns in a Newly Diagnosed Multiple Myeloma Non-Interventional Disease Registry Short Prognostic APP for Multiple Myeloma Sample Size Charts for Spearman and Kendall Coefficients
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1