Clumppling: cluster matching and permutation program with integer linear programming

IF 5.4 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS Bioinformatics Pub Date : 2023-12-14 DOI:10.1093/bioinformatics/btad751

Xiran Liu, Naama M Kopelman, Noah A Rosenberg

{"title":"Clumppling: cluster matching and permutation program with integer linear programming","authors":"Xiran Liu, Naama M Kopelman, Noah A Rosenberg","doi":"10.1093/bioinformatics/btad751","DOIUrl":null,"url":null,"abstract":"Motivation In the mixed-membership unsupervised clustering analyses commonly used in population genetics, multiple replicate data analyses can differ in their clustering solutions. Combinatorial algorithms assist in aligning clustering outputs from multiple replicates, so that clustering solutions can be interpreted and combined across replicates. Although several algorithms have been introduced, challenges exist in achieving optimal alignments and performing alignments in reasonable computation time. Results We present Clumppling, a method for aligning replicate solutions in mixed-membership unsupervised clustering. The method uses integer linear programming for finding optimal alignments, embedding the cluster alignment problem in standard combinatorial optimization frameworks. In example analyses, we find that it achieves solutions with preferred values of a desired objective function relative to those achieved by Pong, and that it proceeds with less computation time than Clumpak. It is also the first method to permit alignments across replicates with multiple arbitrary values of the number of clusters K. Availability Clumppling is available at https://github.com/PopGenClustering/Clumppling. Supplementary information Supplementary data are available online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"25 1","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2023-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btad751","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Motivation In the mixed-membership unsupervised clustering analyses commonly used in population genetics, multiple replicate data analyses can differ in their clustering solutions. Combinatorial algorithms assist in aligning clustering outputs from multiple replicates, so that clustering solutions can be interpreted and combined across replicates. Although several algorithms have been introduced, challenges exist in achieving optimal alignments and performing alignments in reasonable computation time. Results We present Clumppling, a method for aligning replicate solutions in mixed-membership unsupervised clustering. The method uses integer linear programming for finding optimal alignments, embedding the cluster alignment problem in standard combinatorial optimization frameworks. In example analyses, we find that it achieves solutions with preferred values of a desired objective function relative to those achieved by Pong, and that it proceeds with less computation time than Clumpak. It is also the first method to permit alignments across replicates with multiple arbitrary values of the number of clusters K. Availability Clumppling is available at https://github.com/PopGenClustering/Clumppling. Supplementary information Supplementary data are available online.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Clumppling：采用整数线性规划的群组匹配和置换程序

动机在群体遗传学常用的混合成员无监督聚类分析中，多个重复数据分析的聚类解决方案可能不同。组合算法有助于对多个重复数据的聚类结果进行对齐，从而可以解释和组合不同重复数据的聚类解决方案。虽然已经引入了几种算法，但在实现最佳配准和在合理计算时间内执行配准方面仍存在挑战。结果我们提出了一种在混合成员无监督聚类中对齐复制解的方法--Clumppling。该方法使用整数线性规划来寻找最优配准，将聚类配准问题嵌入到标准的组合优化框架中。在实例分析中，我们发现与 Pong 方法相比，该方法能获得具有所需目标函数优选值的解决方案，而且与 Clumpak 方法相比，该方法的计算时间更短。它也是第一种允许在具有多个任意聚类数 K 值的重复序列中进行排列的方法。Clumppling 可在 https://github.com/PopGenClustering/Clumppling 网站上获取。补充信息补充数据可在线获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Bioinformatics 生物-生化研究方法

CiteScore

11.20

自引率

5.20%

发文量

753

审稿时长

2.1 months

期刊介绍： The leading journal in its field, Bioinformatics publishes the highest quality scientific papers and review articles of interest to academic and industrial researchers. Its main focus is on new developments in genome bioinformatics and computational biology. Two distinct sections within the journal - Discovery Notes and Application Notes- focus on shorter papers; the former reporting biologically interesting discoveries using computational methods, the latter exploring the applications used for experiments.