Correlation-Aware Object Placement for Multi-Object Operations

Ming Zhong, Kai Shen, J. Seiferas
{"title":"Correlation-Aware Object Placement for Multi-Object Operations","authors":"Ming Zhong, Kai Shen, J. Seiferas","doi":"10.1109/ICDCS.2008.60","DOIUrl":null,"url":null,"abstract":"A multi-object operation incurs communication or synchronization overhead when the requested objects are distributed over different nodes. The object pair correlations (the probability for a pair of objects to be requested together in an operation) are often highly skewed and yet stable over time for real-world distributed applications. Thus, placing strongly correlated objects on the same node (subject to node space constraint) tends to reduce communication overhead for multi-object operations. This paper studies the optimization of correlation-aware data placement. First, we formalize a restricted form of the problem as a variant of the classic Quadratic Assignment problem and we show that it is NP-hard. Based on a linear programming relaxation, we then propose a polynomial-time approximation algorithm that finds an object placement with communication overhead at most two times that of the optimal placement. We further show that the computation cost can be reduced by limiting the optimization scope to a relatively small number of most important objects. We quantitatively evaluate our approach on keyword index placement for full-text search engines using real traces of 3.7 million web pages and 6.8 million search queries. Compared to the correlation-oblivious random object placement, our approach achieves 37-86% communication overhead reduction on a range of optimization scopes and system sizes. The communication reduction is 30-78% compared to a correlation-aware greedy approach.","PeriodicalId":240205,"journal":{"name":"2008 The 28th International Conference on Distributed Computing Systems","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 The 28th International Conference on Distributed Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS.2008.60","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

Abstract

A multi-object operation incurs communication or synchronization overhead when the requested objects are distributed over different nodes. The object pair correlations (the probability for a pair of objects to be requested together in an operation) are often highly skewed and yet stable over time for real-world distributed applications. Thus, placing strongly correlated objects on the same node (subject to node space constraint) tends to reduce communication overhead for multi-object operations. This paper studies the optimization of correlation-aware data placement. First, we formalize a restricted form of the problem as a variant of the classic Quadratic Assignment problem and we show that it is NP-hard. Based on a linear programming relaxation, we then propose a polynomial-time approximation algorithm that finds an object placement with communication overhead at most two times that of the optimal placement. We further show that the computation cost can be reduced by limiting the optimization scope to a relatively small number of most important objects. We quantitatively evaluate our approach on keyword index placement for full-text search engines using real traces of 3.7 million web pages and 6.8 million search queries. Compared to the correlation-oblivious random object placement, our approach achieves 37-86% communication overhead reduction on a range of optimization scopes and system sizes. The communication reduction is 30-78% compared to a correlation-aware greedy approach.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
面向多对象操作的关联感知对象放置
当请求的对象分布在不同的节点上时,多对象操作会导致通信或同步开销。对象对相关性(在操作中同时请求一对对象的概率)通常是高度倾斜的,但对于真实的分布式应用程序来说,随着时间的推移是稳定的。因此,在同一节点上放置强相关对象(受节点空间约束)往往会减少多对象操作的通信开销。本文研究了关联感知数据放置的优化问题。首先,我们将问题的限制形式形式化为经典二次分配问题的变体,并证明它是np困难的。基于线性规划松弛,我们提出了一种多项式时间近似算法,该算法可以找到通信开销最多为最优放置的两倍的对象放置。我们进一步表明,通过将优化范围限制在相对较少的最重要对象上,可以减少计算成本。我们使用370万个网页和680万个搜索查询的真实痕迹,定量地评估了全文搜索引擎的关键字索引放置方法。与无关随机对象放置相比,我们的方法在优化范围和系统大小的范围内实现了37-86%的通信开销减少。与关联感知贪婪方法相比,通信减少了30-78%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Relative Network Positioning via CDN Redirections Compiler-Assisted Application-Level Checkpointing for MPI Programs Exploring Anti-Spam Models in Large Scale VoIP Systems Correlation-Aware Object Placement for Multi-Object Operations Probing Queries in Wireless Sensor Networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1