片上机器学习在多线程处理器动态资源控制中的应用

Parallel Process. Lett. Pub Date : 2019-10-01 DOI:10.1142/s0129626419500130

Shane Carroll, Wei-Ming Lin

{"title":"片上机器学习在多线程处理器动态资源控制中的应用","authors":"Shane Carroll, Wei-Ming Lin","doi":"10.1142/s0129626419500130","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a machine learning algorithm to control instruction fetch bandwidth in a simultaneous multithreaded CPU. In a simultaneous multithreaded CPU, multiple threads occupy pools of hardware resources in the same clock cycle. Under some conditions, one or more threads may undergo a period of inefficiency, e.g., a cache miss, thereby inefficiently using shared resources and degrading the performance of other threads. If these inefficiencies can be identified at runtime, the offending thread can be temporarily blocked from fetching new instructions into the pipeline and given time to recover from its inefficiency, and prevent the shared system resources from being wasted on a stalled thread. In this paper, we propose a machine learning approach to determine when a thread should be blocked from fetching new instructions. The model is trained offline and the parameters embedded in a CPU, which can be queried with runtime statistics to determine if a thread is running inefficiently and should be temporarily blocked from fetching. We propose two models: a simple linear model and a higher-capacity neural network. We test each model in a simulation environment and show that system performance can increase by up to 19% on average with a feasible implementation of the proposed algorithm.","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Applied On-Chip Machine Learning for Dynamic Resource Control in Multithreaded Processors\",\"authors\":\"Shane Carroll, Wei-Ming Lin\",\"doi\":\"10.1142/s0129626419500130\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a machine learning algorithm to control instruction fetch bandwidth in a simultaneous multithreaded CPU. In a simultaneous multithreaded CPU, multiple threads occupy pools of hardware resources in the same clock cycle. Under some conditions, one or more threads may undergo a period of inefficiency, e.g., a cache miss, thereby inefficiently using shared resources and degrading the performance of other threads. If these inefficiencies can be identified at runtime, the offending thread can be temporarily blocked from fetching new instructions into the pipeline and given time to recover from its inefficiency, and prevent the shared system resources from being wasted on a stalled thread. In this paper, we propose a machine learning approach to determine when a thread should be blocked from fetching new instructions. The model is trained offline and the parameters embedded in a CPU, which can be queried with runtime statistics to determine if a thread is running inefficiently and should be temporarily blocked from fetching. We propose two models: a simple linear model and a higher-capacity neural network. We test each model in a simulation environment and show that system performance can increase by up to 19% on average with a feasible implementation of the proposed algorithm.\",\"PeriodicalId\":422436,\"journal\":{\"name\":\"Parallel Process. Lett.\",\"volume\":\"77 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Parallel Process. Lett.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/s0129626419500130\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Parallel Process. Lett.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s0129626419500130","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

在本文中，我们提出了一种机器学习算法来控制同步多线程CPU中的指令获取带宽。在并发多线程CPU中，多个线程在同一个时钟周期内占用硬件资源池。在某些情况下，一个或多个线程可能会经历一段时间的低效率，例如，缓存丢失，从而低效地使用共享资源并降低其他线程的性能。如果可以在运行时识别这些低效率，则可以暂时阻止问题线程向管道中获取新指令，并给予时间从低效率中恢复，并防止共享系统资源浪费在停滞的线程上。在本文中，我们提出了一种机器学习方法来确定何时应该阻止线程获取新指令。该模型是离线训练的，参数嵌入到CPU中，可以通过运行时统计信息查询CPU，以确定线程是否运行效率低下，是否应该暂时阻止提取。我们提出了两个模型:一个简单的线性模型和一个高容量的神经网络。我们在仿真环境中对每个模型进行了测试，结果表明，通过提出的算法的可行实现，系统性能平均可提高19%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Applied On-Chip Machine Learning for Dynamic Resource Control in Multithreaded Processors

In this paper, we propose a machine learning algorithm to control instruction fetch bandwidth in a simultaneous multithreaded CPU. In a simultaneous multithreaded CPU, multiple threads occupy pools of hardware resources in the same clock cycle. Under some conditions, one or more threads may undergo a period of inefficiency, e.g., a cache miss, thereby inefficiently using shared resources and degrading the performance of other threads. If these inefficiencies can be identified at runtime, the offending thread can be temporarily blocked from fetching new instructions into the pipeline and given time to recover from its inefficiency, and prevent the shared system resources from being wasted on a stalled thread. In this paper, we propose a machine learning approach to determine when a thread should be blocked from fetching new instructions. The model is trained offline and the parameters embedded in a CPU, which can be queried with runtime statistics to determine if a thread is running inefficiently and should be temporarily blocked from fetching. We propose two models: a simple linear model and a higher-capacity neural network. We test each model in a simulation environment and show that system performance can increase by up to 19% on average with a feasible implementation of the proposed algorithm.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Parallel Process. Lett.

自引率

0.00%

发文量

期刊最新文献

A Note to Non-adaptive Broadcasting Semi-Supervised Node Classification via Semi-Global Graph Transformer Based on Homogeneity Augmentation 4-Free Strong Digraphs with the Maximum Size Relation-aware Graph Contrastive Learning The Normalized Laplacian Spectrum of Folded Hypercube with Applications