*K-means and cluster models for cancer signatures

Q1 Biochemistry, Genetics and Molecular Biology Biomolecular Detection and Quantification Pub Date : 2017-09-01 Epub Date: 2017-08-02 DOI:10.1016/j.bdq.2017.07.001

Zura Kakushadze , Willie Yu

引用次数: 35

Abstract

We present *K-means clustering algorithm and source code by expanding statistical clustering methods applied in https://ssrn.com/abstract=2802753 to quantitative finance. *K-means is statistically deterministic without specifying initial centers, etc. We apply *K-means to extracting cancer signatures from genome data without using nonnegative matrix factorization (NMF). *K-means’ computational cost is a fraction of NMF’s. Using 1389 published samples for 14 cancer types, we find that 3 cancers (liver cancer, lung cancer and renal cell carcinoma) stand out and do not have cluster-like structures. Two clusters have especially high within-cluster correlations with 11 other cancers indicating common underlying structures. Our approach opens a novel avenue for studying such structures. *K-means is universal and can be applied in other fields. We discuss some potential applications in quantitative finance.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

*癌症特征的K-means和聚类模型

将https://ssrn.com/abstract=2802753中应用的统计聚类方法扩展到定量金融中，给出了*K-means聚类算法和源代码。*K-means在不指定初始中心等情况下具有统计确定性。我们采用*K-means方法从基因组数据中提取癌症特征，而不使用非负矩阵分解(NMF)。*K-means的计算成本是NMF的一小部分。使用14种癌症类型的1389个已发表的样本，我们发现3种癌症(肝癌、肺癌和肾细胞癌)很突出，没有簇状结构。有两组癌症与其他11种癌症的群内相关性特别高，这表明它们有共同的潜在结构。我们的方法为研究这种结构开辟了一条新的途径。*K-means具有普适性，可应用于其他领域。我们讨论了量化金融的一些潜在应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊