Imbalanced Hypergraph Partitioning and Improvements for Consensus Clustering

John Robert Yaros, T. Imielinski
{"title":"Imbalanced Hypergraph Partitioning and Improvements for Consensus Clustering","authors":"John Robert Yaros, T. Imielinski","doi":"10.1109/ICTAI.2013.61","DOIUrl":null,"url":null,"abstract":"Hypergraph partitioning is typically defined as an optimization problem wherein vertices are placed in separate parts (of a partition) such that the fewest number of hyperedges will span multiple parts. To ensure that parts have sizes satisfying user requirements, constraints are typically imposed. Under such constraints, the problem is known to be NP-Hard, so heuristic methods are needed to find approximate solutions in reasonable time. Circuit layout has historically been one of the most prominent application areas and has seen a proliferation of tools designed to satisfy its needs. Constraints in these tools typically focus on equal size parts, allowing the user to specify a maximum tolerance for deviation from that equal size. A more generalized constraint allows the user to define fixed sizes and tolerances for each part. More recently, other domains have mapped problems to hypergraph partitioning and, perhaps due to their availability, have used existing tools to perform partitioning. In particular, consensus clustering easily fits a hypergraph representation where each cluster of each input clustering is represented by a hyperedge. Authors of such research have reported partitioning tends to only have good results when clusters can be expected to be roughly the same size, an unsurprising result given the tools' focus on equal sized parts. Thus, even though many datasets have \"natural\" part sizes that are mixed, the current toolset is ill-suited to find good solutions unless those part sizes are known a priori. We argue that the main issue rests in the current constraint definitions and their focus measuring imbalance on the basis of the largest/smallest part. We further argue that, due to its holistic nature, entropy best measures imbalance and can best guide the partition method to the natural part sizes with lowest cut for a given level of imbalance. We provide a method that finds good approximate solutions under an entropy constraint and further introduce the notion of a discount cut, which helps overcome local optima that frequently plague k-way partitioning algorithms. In comparison to today's popular tools, we show our method returns sizable improvements in cut size as the level of imbalance grows. In consensus clustering, we demonstrate that good solutions are more easily achieved even when part sizes are not roughly equal.","PeriodicalId":140309,"journal":{"name":"2013 IEEE 25th International Conference on Tools with Artificial Intelligence","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 25th International Conference on Tools with Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI.2013.61","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

Hypergraph partitioning is typically defined as an optimization problem wherein vertices are placed in separate parts (of a partition) such that the fewest number of hyperedges will span multiple parts. To ensure that parts have sizes satisfying user requirements, constraints are typically imposed. Under such constraints, the problem is known to be NP-Hard, so heuristic methods are needed to find approximate solutions in reasonable time. Circuit layout has historically been one of the most prominent application areas and has seen a proliferation of tools designed to satisfy its needs. Constraints in these tools typically focus on equal size parts, allowing the user to specify a maximum tolerance for deviation from that equal size. A more generalized constraint allows the user to define fixed sizes and tolerances for each part. More recently, other domains have mapped problems to hypergraph partitioning and, perhaps due to their availability, have used existing tools to perform partitioning. In particular, consensus clustering easily fits a hypergraph representation where each cluster of each input clustering is represented by a hyperedge. Authors of such research have reported partitioning tends to only have good results when clusters can be expected to be roughly the same size, an unsurprising result given the tools' focus on equal sized parts. Thus, even though many datasets have "natural" part sizes that are mixed, the current toolset is ill-suited to find good solutions unless those part sizes are known a priori. We argue that the main issue rests in the current constraint definitions and their focus measuring imbalance on the basis of the largest/smallest part. We further argue that, due to its holistic nature, entropy best measures imbalance and can best guide the partition method to the natural part sizes with lowest cut for a given level of imbalance. We provide a method that finds good approximate solutions under an entropy constraint and further introduce the notion of a discount cut, which helps overcome local optima that frequently plague k-way partitioning algorithms. In comparison to today's popular tools, we show our method returns sizable improvements in cut size as the level of imbalance grows. In consensus clustering, we demonstrate that good solutions are more easily achieved even when part sizes are not roughly equal.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
非平衡超图划分及共识聚类的改进
超图划分通常被定义为一个优化问题,其中顶点被放置在(分区的)独立部分,这样最少数量的超边将跨越多个部分。为了确保零件的尺寸满足用户需求,通常会施加约束。在这种约束下,已知问题是NP-Hard的,因此需要启发式方法在合理的时间内找到近似解。电路布局历来是最突出的应用领域之一,并且已经看到了为满足其需求而设计的工具的扩散。这些工具中的约束通常集中在等尺寸的零件上,允许用户指定偏离等尺寸的最大公差。更广义的约束允许用户为每个零件定义固定的尺寸和公差。最近,其他领域已经将问题映射到超图分区,并且可能由于它们的可用性,已经使用现有工具来执行分区。特别是,共识聚类很容易适合超图表示,其中每个输入聚类的每个聚类都由超边缘表示。此类研究的作者报告说,只有当集群的大小大致相同时,划分才会有好的结果,考虑到工具关注的是大小相等的部分,这一结果并不令人惊讶。因此,即使许多数据集具有混合的“自然”零件尺寸,当前的工具集也不适合找到好的解决方案,除非这些零件尺寸是先验已知的。我们认为,主要问题在于当前的约束定义,以及它们以最大/最小部分为基础衡量不平衡的重点。我们进一步认为,由于熵的整体性,熵可以最好地衡量不平衡,并且可以最好地指导分割方法在给定的不平衡水平下具有最低切割的自然部分尺寸。我们提供了一种在熵约束下找到好的近似解的方法,并进一步引入了折扣削减的概念,这有助于克服经常困扰k-way划分算法的局部最优解。与当今流行的工具相比,我们表明,随着不平衡水平的增长,我们的方法在切割尺寸方面得到了相当大的改进。在一致聚类中,我们证明了即使零件尺寸不大致相等,也更容易获得好的解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
An Automatic Algorithm Selection Approach for Planning Learning Useful Macro-actions for Planning with N-Grams Optimizing Dynamic Ensemble Selection Procedure by Evolutionary Extreme Learning Machines and a Noise Reduction Filter Motion-Driven Action-Based Planning Assessing Procedural Knowledge in Free-Text Answers through a Hybrid Semantic Web Approach
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1