{"title":"单细胞突变数据聚类的参数化模型","authors":"Jiaqian Yan, Jianing Xi, Zhenhua Yu","doi":"10.1109/BIBM55620.2022.9995308","DOIUrl":null,"url":null,"abstract":"Clustering tumor single-cell mutation data has formed an important paradigm for deciphering tumor subclones and evolutionary history. This type of data may often be heavily complicated by incompleteness, false positives and false negatives errors. Despite to the fact that several computational methods have been developed for clustering binary mutation data, their applications still suffer from degraded accuracy on large datasets or datasets with high sparsity. Therefore, more effective methods are sorely required. Here, we propose a novel method called CBM for reliably Clustering Binary Mutation data. CBM formulates the binary mutation data under a probabilistic framework through parameterizing false positive errors, false negative errors, presence probability distribution of subclones and their binary mutation profiles. To cope with the difficulty of optimizing discrete parameters, Gibbs sampling for mixtures is employed to iteratively sample cell-to-cluster assignments and cluster centers from the posterior. Extensive evaluations on simulated and real datasets demonstrate CBM outperforms the state-of-the-art tools in different performance metrics such as ARI for clustering and accuracy for genotyping. CBM can be integrated into the pipeline of reconstructing tumor evolutionary tree, and detecting subclones using CBM can be employed as a pre-text task of tumor subclonal tree inference, which will significantly improve computational efficiency of phylogenetic analysis especially on large datasets. CBM software is freely available at https://github.com/zhyu-lab/cbm.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A parametric model for clustering single-cell mutation data\",\"authors\":\"Jiaqian Yan, Jianing Xi, Zhenhua Yu\",\"doi\":\"10.1109/BIBM55620.2022.9995308\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clustering tumor single-cell mutation data has formed an important paradigm for deciphering tumor subclones and evolutionary history. This type of data may often be heavily complicated by incompleteness, false positives and false negatives errors. Despite to the fact that several computational methods have been developed for clustering binary mutation data, their applications still suffer from degraded accuracy on large datasets or datasets with high sparsity. Therefore, more effective methods are sorely required. Here, we propose a novel method called CBM for reliably Clustering Binary Mutation data. CBM formulates the binary mutation data under a probabilistic framework through parameterizing false positive errors, false negative errors, presence probability distribution of subclones and their binary mutation profiles. To cope with the difficulty of optimizing discrete parameters, Gibbs sampling for mixtures is employed to iteratively sample cell-to-cluster assignments and cluster centers from the posterior. Extensive evaluations on simulated and real datasets demonstrate CBM outperforms the state-of-the-art tools in different performance metrics such as ARI for clustering and accuracy for genotyping. CBM can be integrated into the pipeline of reconstructing tumor evolutionary tree, and detecting subclones using CBM can be employed as a pre-text task of tumor subclonal tree inference, which will significantly improve computational efficiency of phylogenetic analysis especially on large datasets. CBM software is freely available at https://github.com/zhyu-lab/cbm.\",\"PeriodicalId\":210337,\"journal\":{\"name\":\"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBM55620.2022.9995308\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM55620.2022.9995308","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A parametric model for clustering single-cell mutation data
Clustering tumor single-cell mutation data has formed an important paradigm for deciphering tumor subclones and evolutionary history. This type of data may often be heavily complicated by incompleteness, false positives and false negatives errors. Despite to the fact that several computational methods have been developed for clustering binary mutation data, their applications still suffer from degraded accuracy on large datasets or datasets with high sparsity. Therefore, more effective methods are sorely required. Here, we propose a novel method called CBM for reliably Clustering Binary Mutation data. CBM formulates the binary mutation data under a probabilistic framework through parameterizing false positive errors, false negative errors, presence probability distribution of subclones and their binary mutation profiles. To cope with the difficulty of optimizing discrete parameters, Gibbs sampling for mixtures is employed to iteratively sample cell-to-cluster assignments and cluster centers from the posterior. Extensive evaluations on simulated and real datasets demonstrate CBM outperforms the state-of-the-art tools in different performance metrics such as ARI for clustering and accuracy for genotyping. CBM can be integrated into the pipeline of reconstructing tumor evolutionary tree, and detecting subclones using CBM can be employed as a pre-text task of tumor subclonal tree inference, which will significantly improve computational efficiency of phylogenetic analysis especially on large datasets. CBM software is freely available at https://github.com/zhyu-lab/cbm.