Kostiantyn Berezovskyi, K. Bletsas, Björn Andersson
{"title":"运行在单个流多处理器上的GPU线程的最大跨度计算","authors":"Kostiantyn Berezovskyi, K. Bletsas, Björn Andersson","doi":"10.1109/ECRTS.2012.16","DOIUrl":null,"url":null,"abstract":"Graphics processors were originally developed for rendering graphics but have recently evolved towards being an architecture for general-purpose computations. They are also expected to become important parts of embedded systems hardware -- not just for graphics. However, this necessitates the development of appropriate timing analysis techniques which would be required because techniques developed for CPU scheduling are not applicable. The reason is that we are not interested in how long it takes for any given GPU thread to complete, but rather how long it takes for all of them to complete. We therefore develop a simple method for finding an upper bound on the make span of a group of GPU threads executing the same program and competing for the resources of a single streaming multiprocessor (whose architecture is based on NVIDIA Fermi, with some simplifying assumptions). We then build upon this method to formulate the derivation of the exact worst-case make span (and corresponding schedule) as an optimization problem. Addressing the issue of tractability, we also present a technique for efficiently computing a safe estimate of the worst-case make span with minimal pessimism, for use when finding an exact value would take too long.","PeriodicalId":425794,"journal":{"name":"2012 24th Euromicro Conference on Real-Time Systems","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":"{\"title\":\"Makespan Computation for GPU Threads Running on a Single Streaming Multiprocessor\",\"authors\":\"Kostiantyn Berezovskyi, K. Bletsas, Björn Andersson\",\"doi\":\"10.1109/ECRTS.2012.16\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graphics processors were originally developed for rendering graphics but have recently evolved towards being an architecture for general-purpose computations. They are also expected to become important parts of embedded systems hardware -- not just for graphics. However, this necessitates the development of appropriate timing analysis techniques which would be required because techniques developed for CPU scheduling are not applicable. The reason is that we are not interested in how long it takes for any given GPU thread to complete, but rather how long it takes for all of them to complete. We therefore develop a simple method for finding an upper bound on the make span of a group of GPU threads executing the same program and competing for the resources of a single streaming multiprocessor (whose architecture is based on NVIDIA Fermi, with some simplifying assumptions). We then build upon this method to formulate the derivation of the exact worst-case make span (and corresponding schedule) as an optimization problem. Addressing the issue of tractability, we also present a technique for efficiently computing a safe estimate of the worst-case make span with minimal pessimism, for use when finding an exact value would take too long.\",\"PeriodicalId\":425794,\"journal\":{\"name\":\"2012 24th Euromicro Conference on Real-Time Systems\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 24th Euromicro Conference on Real-Time Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ECRTS.2012.16\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 24th Euromicro Conference on Real-Time Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECRTS.2012.16","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Makespan Computation for GPU Threads Running on a Single Streaming Multiprocessor
Graphics processors were originally developed for rendering graphics but have recently evolved towards being an architecture for general-purpose computations. They are also expected to become important parts of embedded systems hardware -- not just for graphics. However, this necessitates the development of appropriate timing analysis techniques which would be required because techniques developed for CPU scheduling are not applicable. The reason is that we are not interested in how long it takes for any given GPU thread to complete, but rather how long it takes for all of them to complete. We therefore develop a simple method for finding an upper bound on the make span of a group of GPU threads executing the same program and competing for the resources of a single streaming multiprocessor (whose architecture is based on NVIDIA Fermi, with some simplifying assumptions). We then build upon this method to formulate the derivation of the exact worst-case make span (and corresponding schedule) as an optimization problem. Addressing the issue of tractability, we also present a technique for efficiently computing a safe estimate of the worst-case make span with minimal pessimism, for use when finding an exact value would take too long.