Reducing Scalability Collapse via Requester-Based Locking on Multicore Systems

2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems Pub Date : 2012-08-07 DOI:10.1109/MASCOTS.2012.42

Yan Cui, Yingxin Wang, Yu Chen, Yuanchun Shi, Wei Han, Xin Liao, Fei Wang

{"title":"Reducing Scalability Collapse via Requester-Based Locking on Multicore Systems","authors":"Yan Cui, Yingxin Wang, Yu Chen, Yuanchun Shi, Wei Han, Xin Liao, Fei Wang","doi":"10.1109/MASCOTS.2012.42","DOIUrl":null,"url":null,"abstract":"In response to the increasing ubiquity of multicore processors, there has been widespread development of multithreaded applications that strive to realize their full potential. Unfortunately, lock contention within operating systems can limit the scalability of multicore systems so severely that an increase in the number of cores can actually lead to reduced performance (i.e. scalability collapse). Existing lock implementations have disadvantages in scalability, resource utilization and energy efficiency. In this work, we observe that the number of tasks requesting a lock has a significant correlation with the occurrence of scalability collapse. Based on this observation, we propose a novel lock implementation that allows tasks blocked on a lock to either spin or maintain a power-saving state according to the number of lock requesters. We call our lock implementation protocol a requester-based lock and implement it in the Linux kernel to replace its default spin lock. Based on the results of an analysis, we find that the best policy for a task waiting for a lock to become free is to enter the power saving state immediately after noticing that the lock cannot be acquired. Our lock-requester based lock scheme is evaluated using micro- and macro-benchmarks on AMD 32-core and Intel 40-core systems. Experimental results indicate our lock scheme removes scalability collapse completely for most applications. Furthermore, our method shows better scalability and energy efficiency than mutex locks and adaptive locks.","PeriodicalId":278764,"journal":{"name":"2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MASCOTS.2012.42","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

In response to the increasing ubiquity of multicore processors, there has been widespread development of multithreaded applications that strive to realize their full potential. Unfortunately, lock contention within operating systems can limit the scalability of multicore systems so severely that an increase in the number of cores can actually lead to reduced performance (i.e. scalability collapse). Existing lock implementations have disadvantages in scalability, resource utilization and energy efficiency. In this work, we observe that the number of tasks requesting a lock has a significant correlation with the occurrence of scalability collapse. Based on this observation, we propose a novel lock implementation that allows tasks blocked on a lock to either spin or maintain a power-saving state according to the number of lock requesters. We call our lock implementation protocol a requester-based lock and implement it in the Linux kernel to replace its default spin lock. Based on the results of an analysis, we find that the best policy for a task waiting for a lock to become free is to enter the power saving state immediately after noticing that the lock cannot be acquired. Our lock-requester based lock scheme is evaluated using micro- and macro-benchmarks on AMD 32-core and Intel 40-core systems. Experimental results indicate our lock scheme removes scalability collapse completely for most applications. Furthermore, our method shows better scalability and energy efficiency than mutex locks and adaptive locks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在多核系统上通过基于请求者的锁定减少可伸缩性崩溃

为了响应多核处理器的日益普及，多线程应用程序得到了广泛的开发，以努力实现其全部潜力。不幸的是，操作系统中的锁争用会严重限制多核系统的可伸缩性，以至于内核数量的增加实际上会导致性能降低(即可伸缩性崩溃)。现有的锁实现在可伸缩性、资源利用和能源效率方面存在缺点。在这项工作中，我们观察到请求锁的任务数量与可伸缩性崩溃的发生有显著的相关性。基于这一观察，我们提出了一种新的锁实现，它允许锁上阻塞的任务根据锁请求者的数量旋转或保持省电状态。我们将锁实现协议称为基于请求程序的锁，并在Linux内核中实现它，以取代其默认的自旋锁。根据分析结果，我们发现等待锁释放的任务的最佳策略是在注意到无法获得锁后立即进入省电状态。我们基于锁请求程序的锁方案在AMD 32核和Intel 40核系统上使用微观和宏观基准测试进行了评估。实验结果表明，我们的锁方案完全消除了大多数应用程序的可伸缩性崩溃。与互斥锁和自适应锁相比，该方法具有更好的可扩展性和能量效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems

自引率

0.00%

发文量