*癌症特征的K-means和聚类模型

Q1 Biochemistry, Genetics and Molecular Biology Biomolecular Detection and Quantification Pub Date : 2017-09-01 Epub Date: 2017-08-02 DOI:10.1016/j.bdq.2017.07.001

Zura Kakushadze , Willie Yu

{"title":"*癌症特征的K-means和聚类模型","authors":"Zura Kakushadze , Willie Yu","doi":"10.1016/j.bdq.2017.07.001","DOIUrl":null,"url":null,"abstract":"<div><p>We present *K-means clustering algorithm and source code by expanding statistical clustering methods applied in <span>https://ssrn.com/abstract=2802753</span><svg><path></path></svg> to quantitative finance. *K-means is statistically deterministic without specifying initial centers, etc. We apply *K-means to extracting cancer signatures from genome data without using nonnegative matrix factorization (NMF). *K-means’ computational cost is a fraction of NMF’s. Using 1389 published samples for 14 cancer types, we find that 3 cancers (liver cancer, lung cancer and renal cell carcinoma) stand out and do not have cluster-like structures. Two clusters have especially high within-cluster correlations with 11 other cancers indicating common underlying structures. Our approach opens a novel avenue for studying such structures. *K-means is universal and can be applied in other fields. We discuss some potential applications in quantitative finance.</p></div>","PeriodicalId":38073,"journal":{"name":"Biomolecular Detection and Quantification","volume":"13 ","pages":"Pages 7-31"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.bdq.2017.07.001","citationCount":"35","resultStr":"{\"title\":\"*K-means and cluster models for cancer signatures\",\"authors\":\"Zura Kakushadze , Willie Yu\",\"doi\":\"10.1016/j.bdq.2017.07.001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>We present *K-means clustering algorithm and source code by expanding statistical clustering methods applied in <span>https://ssrn.com/abstract=2802753</span><svg><path></path></svg> to quantitative finance. *K-means is statistically deterministic without specifying initial centers, etc. We apply *K-means to extracting cancer signatures from genome data without using nonnegative matrix factorization (NMF). *K-means’ computational cost is a fraction of NMF’s. Using 1389 published samples for 14 cancer types, we find that 3 cancers (liver cancer, lung cancer and renal cell carcinoma) stand out and do not have cluster-like structures. Two clusters have especially high within-cluster correlations with 11 other cancers indicating common underlying structures. Our approach opens a novel avenue for studying such structures. *K-means is universal and can be applied in other fields. We discuss some potential applications in quantitative finance.</p></div>\",\"PeriodicalId\":38073,\"journal\":{\"name\":\"Biomolecular Detection and Quantification\",\"volume\":\"13 \",\"pages\":\"Pages 7-31\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1016/j.bdq.2017.07.001\",\"citationCount\":\"35\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomolecular Detection and Quantification\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2214753517302061\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2017/8/2 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"Biochemistry, Genetics and Molecular Biology\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomolecular Detection and Quantification","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214753517302061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2017/8/2 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"Biochemistry, Genetics and Molecular Biology","Score":null,"Total":0}

引用次数: 35

摘要

将https://ssrn.com/abstract=2802753中应用的统计聚类方法扩展到定量金融中，给出了*K-means聚类算法和源代码。*K-means在不指定初始中心等情况下具有统计确定性。我们采用*K-means方法从基因组数据中提取癌症特征，而不使用非负矩阵分解(NMF)。*K-means的计算成本是NMF的一小部分。使用14种癌症类型的1389个已发表的样本，我们发现3种癌症(肝癌、肺癌和肾细胞癌)很突出，没有簇状结构。有两组癌症与其他11种癌症的群内相关性特别高，这表明它们有共同的潜在结构。我们的方法为研究这种结构开辟了一条新的途径。*K-means具有普适性，可应用于其他领域。我们讨论了量化金融的一些潜在应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

*K-means and cluster models for cancer signatures

We present *K-means clustering algorithm and source code by expanding statistical clustering methods applied in https://ssrn.com/abstract=2802753 to quantitative finance. *K-means is statistically deterministic without specifying initial centers, etc. We apply *K-means to extracting cancer signatures from genome data without using nonnegative matrix factorization (NMF). *K-means’ computational cost is a fraction of NMF’s. Using 1389 published samples for 14 cancer types, we find that 3 cancers (liver cancer, lung cancer and renal cell carcinoma) stand out and do not have cluster-like structures. Two clusters have especially high within-cluster correlations with 11 other cancers indicating common underlying structures. Our approach opens a novel avenue for studying such structures. *K-means is universal and can be applied in other fields. We discuss some potential applications in quantitative finance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊