多线程处理器中的轮循线程选择优化

Parallel Process. Lett. Pub Date : 2019-05-10 DOI:10.1142/S0129626419500038

Shane Carroll, Wei-Ming Lin

{"title":"多线程处理器中的轮循线程选择优化","authors":"Shane Carroll, Wei-Ming Lin","doi":"10.1142/S0129626419500038","DOIUrl":null,"url":null,"abstract":"We propose a variation of round-robin ordering in an multi-threaded pipeline to increase system throughput and resource distribution fairness. We show that using round robin with a typical arbitrary ordering results in inefficient use of shared resources and subsequent thread starvation. To address this but still use a simple round-robin approach, we optimally and dynamically sort the order of the round robin periodically at runtime. We show that with 4-threaded workloads, throughput can be improved by over 9% and harmonic throughput by over 3% by sorting thread order at run time. We experiment with multiple stages of the pipeline and show consistent results throughout several experiments using the SPEC CPU 2006 benchmarks. Furthermore, since the technique is still a simple round robin, the increased performance requires little overhead to implement.","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Round Robin Thread Selection Optimization in Multithreaded Processors\",\"authors\":\"Shane Carroll, Wei-Ming Lin\",\"doi\":\"10.1142/S0129626419500038\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a variation of round-robin ordering in an multi-threaded pipeline to increase system throughput and resource distribution fairness. We show that using round robin with a typical arbitrary ordering results in inefficient use of shared resources and subsequent thread starvation. To address this but still use a simple round-robin approach, we optimally and dynamically sort the order of the round robin periodically at runtime. We show that with 4-threaded workloads, throughput can be improved by over 9% and harmonic throughput by over 3% by sorting thread order at run time. We experiment with multiple stages of the pipeline and show consistent results throughout several experiments using the SPEC CPU 2006 benchmarks. Furthermore, since the technique is still a simple round robin, the increased performance requires little overhead to implement.\",\"PeriodicalId\":422436,\"journal\":{\"name\":\"Parallel Process. Lett.\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Parallel Process. Lett.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/S0129626419500038\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Parallel Process. Lett.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/S0129626419500038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

为了提高系统吞吐量和资源分配的公平性，我们提出了一种在多线程管道中进行循环排序的方法。我们表明，使用具有典型任意顺序的轮循会导致共享资源的低效使用和随后的线程饥饿。为了解决这个问题，但仍然使用简单的轮询方法，我们在运行时周期性地对轮询顺序进行优化和动态排序。我们表明，对于4线程工作负载，通过在运行时对线程顺序进行排序，吞吐量可以提高9%以上，协调吞吐量可以提高3%以上。我们对管道的多个阶段进行了实验，并在使用SPEC CPU 2006基准测试的几个实验中显示出一致的结果。此外，由于该技术仍然是一个简单的轮询，因此提高的性能只需要很少的开销来实现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Round Robin Thread Selection Optimization in Multithreaded Processors

We propose a variation of round-robin ordering in an multi-threaded pipeline to increase system throughput and resource distribution fairness. We show that using round robin with a typical arbitrary ordering results in inefficient use of shared resources and subsequent thread starvation. To address this but still use a simple round-robin approach, we optimally and dynamically sort the order of the round robin periodically at runtime. We show that with 4-threaded workloads, throughput can be improved by over 9% and harmonic throughput by over 3% by sorting thread order at run time. We experiment with multiple stages of the pipeline and show consistent results throughout several experiments using the SPEC CPU 2006 benchmarks. Furthermore, since the technique is still a simple round robin, the increased performance requires little overhead to implement.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Parallel Process. Lett.

自引率

0.00%

发文量

期刊最新文献

A Note to Non-adaptive Broadcasting Semi-Supervised Node Classification via Semi-Global Graph Transformer Based on Homogeneity Augmentation 4-Free Strong Digraphs with the Maximum Size Relation-aware Graph Contrastive Learning The Normalized Laplacian Spectrum of Folded Hypercube with Applications