Tianyu Kang, Kourosh Zarringhalam, M. Kuijjer, Ping Chen, John Quackenbush, W. Ding
{"title":"非重叠特征空间稀疏数据聚类及其在癌症亚型分型中的应用","authors":"Tianyu Kang, Kourosh Zarringhalam, M. Kuijjer, Ping Chen, John Quackenbush, W. Ding","doi":"10.1109/ICDM.2018.00138","DOIUrl":null,"url":null,"abstract":"This paper presents a new algorithm, Reinforced and Informed Network-based Clustering(RINC), for finding unknown groups of similar data objects in sparse and largely non-overlapping feature space where a network structure among features can be observed. Sparse and non-overlapping unlabeled data become increasingly common and available especially in text mining and biomedical data mining. RINC inserts a domain informed model into a modelless neural network. In particular, our approach integrates physically meaningful feature dependencies into the neural network architecture and soft computational constraint. Our learning algorithm efficiently clusters sparse data through integrated smoothing and sparse auto-encoder learning. The informed design requires fewer samples for training and at least part of the model becomes explainable. The architecture of the reinforced network layers smooths sparse data over the network dependency in the feature space. Most importantly, through back-propagation, the weights of the reinforced smoothing layers are simultaneously constrained by the remaining sparse auto-encoder layers that set the target values to be equal to the raw inputs. Empirical results demonstrate that RINC achieves improved accuracy and renders physically meaningful clustering results.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Clustering on Sparse Data in Non-overlapping Feature Space with Applications to Cancer Subtyping\",\"authors\":\"Tianyu Kang, Kourosh Zarringhalam, M. Kuijjer, Ping Chen, John Quackenbush, W. Ding\",\"doi\":\"10.1109/ICDM.2018.00138\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a new algorithm, Reinforced and Informed Network-based Clustering(RINC), for finding unknown groups of similar data objects in sparse and largely non-overlapping feature space where a network structure among features can be observed. Sparse and non-overlapping unlabeled data become increasingly common and available especially in text mining and biomedical data mining. RINC inserts a domain informed model into a modelless neural network. In particular, our approach integrates physically meaningful feature dependencies into the neural network architecture and soft computational constraint. Our learning algorithm efficiently clusters sparse data through integrated smoothing and sparse auto-encoder learning. The informed design requires fewer samples for training and at least part of the model becomes explainable. The architecture of the reinforced network layers smooths sparse data over the network dependency in the feature space. Most importantly, through back-propagation, the weights of the reinforced smoothing layers are simultaneously constrained by the remaining sparse auto-encoder layers that set the target values to be equal to the raw inputs. Empirical results demonstrate that RINC achieves improved accuracy and renders physically meaningful clustering results.\",\"PeriodicalId\":286444,\"journal\":{\"name\":\"2018 IEEE International Conference on Data Mining (ICDM)\",\"volume\":\"57 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Conference on Data Mining (ICDM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM.2018.00138\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2018.00138","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
摘要
本文提出了一种新的算法——基于增强和知情网络的聚类算法(reinforcement and Informed network -based Clustering, ring),用于在稀疏且基本上不重叠的特征空间中寻找相似数据对象的未知组,在这些特征空间中可以观察到特征之间的网络结构。稀疏和非重叠的未标记数据在文本挖掘和生物医学数据挖掘中越来越普遍和可用。ringc将一个领域知情模型插入到一个无模型神经网络中。特别是,我们的方法将物理上有意义的特征依赖关系集成到神经网络架构和软计算约束中。我们的学习算法通过融合平滑和稀疏自编码器学习来有效地聚类稀疏数据。知情设计需要更少的样本进行训练,并且至少部分模型变得可以解释。增强网络层的体系结构平滑了特征空间中网络依赖的稀疏数据。最重要的是,通过反向传播,增强平滑层的权重同时受到剩余稀疏自编码器层的约束,这些层将目标值设置为等于原始输入。实证结果表明,ringc在提高准确率的同时,呈现出物理上有意义的聚类结果。
Clustering on Sparse Data in Non-overlapping Feature Space with Applications to Cancer Subtyping
This paper presents a new algorithm, Reinforced and Informed Network-based Clustering(RINC), for finding unknown groups of similar data objects in sparse and largely non-overlapping feature space where a network structure among features can be observed. Sparse and non-overlapping unlabeled data become increasingly common and available especially in text mining and biomedical data mining. RINC inserts a domain informed model into a modelless neural network. In particular, our approach integrates physically meaningful feature dependencies into the neural network architecture and soft computational constraint. Our learning algorithm efficiently clusters sparse data through integrated smoothing and sparse auto-encoder learning. The informed design requires fewer samples for training and at least part of the model becomes explainable. The architecture of the reinforced network layers smooths sparse data over the network dependency in the feature space. Most importantly, through back-propagation, the weights of the reinforced smoothing layers are simultaneously constrained by the remaining sparse auto-encoder layers that set the target values to be equal to the raw inputs. Empirical results demonstrate that RINC achieves improved accuracy and renders physically meaningful clustering results.