{"title":"厚控制流体系结构的概要","authors":"M. Forsell, J. Roivainen, V. Leppänen","doi":"10.1109/SBAC-PADW.2016.9","DOIUrl":null,"url":null,"abstract":"The recently invented thick control flow (TCF) model packs together an unbounded number of fibers, thread-like computational entities, flowing through the same control path. This promises to simplify parallel programming by partially eliminating looping and artificial thread arithmetics. In this paper we outline an architecture for efficiently executing programs written for the TCF model. It features scalable latency hiding via replication of instructions, radical synchronization cost reduction via a wave-based synchronization mechanism, and improved low-level parallelism exploitation via chaining of functional units. Replication of instructions is supported by a dynamic multithreading-like mechanism, which saves the fiber-wise data into special replicated register blocks. The architecture facilitates programmers with compact, unbounded notation of fibers and groups of them together with strong synchronous shared memory algorithmics. According to evaluations, the architecture is able to efficiently handle workloads featuring computational elements with the same control flow, independently of the number of elements. In its turn, this pays out as improved performance and lower power consumption due to elimination of redundant parts of computation and machinery.","PeriodicalId":186179,"journal":{"name":"2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Outline of a Thick Control Flow Architecture\",\"authors\":\"M. Forsell, J. Roivainen, V. Leppänen\",\"doi\":\"10.1109/SBAC-PADW.2016.9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The recently invented thick control flow (TCF) model packs together an unbounded number of fibers, thread-like computational entities, flowing through the same control path. This promises to simplify parallel programming by partially eliminating looping and artificial thread arithmetics. In this paper we outline an architecture for efficiently executing programs written for the TCF model. It features scalable latency hiding via replication of instructions, radical synchronization cost reduction via a wave-based synchronization mechanism, and improved low-level parallelism exploitation via chaining of functional units. Replication of instructions is supported by a dynamic multithreading-like mechanism, which saves the fiber-wise data into special replicated register blocks. The architecture facilitates programmers with compact, unbounded notation of fibers and groups of them together with strong synchronous shared memory algorithmics. According to evaluations, the architecture is able to efficiently handle workloads featuring computational elements with the same control flow, independently of the number of elements. In its turn, this pays out as improved performance and lower power consumption due to elimination of redundant parts of computation and machinery.\",\"PeriodicalId\":186179,\"journal\":{\"name\":\"2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SBAC-PADW.2016.9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBAC-PADW.2016.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The recently invented thick control flow (TCF) model packs together an unbounded number of fibers, thread-like computational entities, flowing through the same control path. This promises to simplify parallel programming by partially eliminating looping and artificial thread arithmetics. In this paper we outline an architecture for efficiently executing programs written for the TCF model. It features scalable latency hiding via replication of instructions, radical synchronization cost reduction via a wave-based synchronization mechanism, and improved low-level parallelism exploitation via chaining of functional units. Replication of instructions is supported by a dynamic multithreading-like mechanism, which saves the fiber-wise data into special replicated register blocks. The architecture facilitates programmers with compact, unbounded notation of fibers and groups of them together with strong synchronous shared memory algorithmics. According to evaluations, the architecture is able to efficiently handle workloads featuring computational elements with the same control flow, independently of the number of elements. In its turn, this pays out as improved performance and lower power consumption due to elimination of redundant parts of computation and machinery.