利用计算更新操作优化TCF处理器中的内存访问

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2020-05-01 DOI:10.1109/IPDPSW50202.2020.00100

M. Forsell, J. Roivainen, J. Träff

{"title":"利用计算更新操作优化TCF处理器中的内存访问","authors":"M. Forsell, J. Roivainen, J. Träff","doi":"10.1109/IPDPSW50202.2020.00100","DOIUrl":null,"url":null,"abstract":"The thick control flow (TCF) model is a data parallel abstraction of the thread model. It merges homogeneous threads (called fibers) flowing through the same control path to entities (called TCFs) with a single control flow and multiple data flows. Fibers of a TCF are executed synchronously with respect to each other and the number of them can be altered dynamically at runtime. Multiple TCFs can be executed in parallel to support control parallelism. In our previous work, we have outlined a special architecture, TPA (Thick control flow Processor Architecture), for executing TCF programs efficiently and shown that designing algorithms with the TCF model often leads to increased performance and simplified programs due to higher abstraction, eliminated loops and redundant program elements.Compute-update memory operations, such as multioperations and atomic instructions, are known to speed up parallel algorithms performing reductions and synchronizations. In this paper, we propose special compute-update memory operations for TCF processors to optimize iterative exclusive inter-fiber memory access patterns. Acceleration is achieved, e.g., in matrix addition and log-prefix style patterns in which multiple target locations can interchange data without reloads between the instructions that slows down execution. Our solution is based on modified active memory units and special memory operations that can send their reply value to another fiber than that initiating the access. We implement these operations in our TPA processor with a minimal HW cost and show that the expected speedups are achieved. Programming examples are given.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimizing Memory Access in TCF Processors with Compute-Update Operations\",\"authors\":\"M. Forsell, J. Roivainen, J. Träff\",\"doi\":\"10.1109/IPDPSW50202.2020.00100\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The thick control flow (TCF) model is a data parallel abstraction of the thread model. It merges homogeneous threads (called fibers) flowing through the same control path to entities (called TCFs) with a single control flow and multiple data flows. Fibers of a TCF are executed synchronously with respect to each other and the number of them can be altered dynamically at runtime. Multiple TCFs can be executed in parallel to support control parallelism. In our previous work, we have outlined a special architecture, TPA (Thick control flow Processor Architecture), for executing TCF programs efficiently and shown that designing algorithms with the TCF model often leads to increased performance and simplified programs due to higher abstraction, eliminated loops and redundant program elements.Compute-update memory operations, such as multioperations and atomic instructions, are known to speed up parallel algorithms performing reductions and synchronizations. In this paper, we propose special compute-update memory operations for TCF processors to optimize iterative exclusive inter-fiber memory access patterns. Acceleration is achieved, e.g., in matrix addition and log-prefix style patterns in which multiple target locations can interchange data without reloads between the instructions that slows down execution. Our solution is based on modified active memory units and special memory operations that can send their reply value to another fiber than that initiating the access. We implement these operations in our TPA processor with a minimal HW cost and show that the expected speedups are achieved. Programming examples are given.\",\"PeriodicalId\":398819,\"journal\":{\"name\":\"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW50202.2020.00100\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW50202.2020.00100","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

粗控制流(TCF)模型是线程模型的数据并行抽象。它将流经同一控制路径的同质线程(称为纤维)合并到具有单个控制流和多个数据流的实体(称为tcf)。TCF的纤维彼此同步执行，并且它们的数量可以在运行时动态更改。多个tcf可以并行执行，以支持控制并行性。在我们之前的工作中，我们概述了一种特殊的体系结构，TPA(厚控制流处理器体系结构)，用于有效地执行TCF程序，并表明使用TCF模型设计算法通常会导致性能的提高和程序的简化，因为更高的抽象性，消除了循环和冗余的程序元素。众所周知，计算更新内存操作，如多操作和原子指令，可以加速执行缩减和同步的并行算法。在本文中，我们提出了TCF处理器的特殊计算更新存储器操作，以优化迭代独占光纤间存储器访问模式。实现了加速，例如，在矩阵加法和日志前缀样式模式中，多个目标位置可以交换数据，而无需在减慢执行速度的指令之间重新加载。我们的解决方案基于修改的活动内存单元和特殊的内存操作，这些操作可以将它们的应答值发送到另一根光纤，而不是发起访问的光纤。我们在TPA处理器中以最小的硬件成本实现了这些操作，并表明达到了预期的速度。给出了编程实例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Optimizing Memory Access in TCF Processors with Compute-Update Operations

The thick control flow (TCF) model is a data parallel abstraction of the thread model. It merges homogeneous threads (called fibers) flowing through the same control path to entities (called TCFs) with a single control flow and multiple data flows. Fibers of a TCF are executed synchronously with respect to each other and the number of them can be altered dynamically at runtime. Multiple TCFs can be executed in parallel to support control parallelism. In our previous work, we have outlined a special architecture, TPA (Thick control flow Processor Architecture), for executing TCF programs efficiently and shown that designing algorithms with the TCF model often leads to increased performance and simplified programs due to higher abstraction, eliminated loops and redundant program elements.Compute-update memory operations, such as multioperations and atomic instructions, are known to speed up parallel algorithms performing reductions and synchronizations. In this paper, we propose special compute-update memory operations for TCF processors to optimize iterative exclusive inter-fiber memory access patterns. Acceleration is achieved, e.g., in matrix addition and log-prefix style patterns in which multiple target locations can interchange data without reloads between the instructions that slows down execution. Our solution is based on modified active memory units and special memory operations that can send their reply value to another fiber than that initiating the access. We implement these operations in our TPA processor with a minimal HW cost and show that the expected speedups are achieved. Programming examples are given.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

自引率

0.00%

发文量

期刊最新文献

PDCunplugged: A Free Repository of Unplugged Parallel Distributed Computing Activities Competitive Evolution of a UAV Swarm for Improving Intruder Detection Rates Workshop 7: HPBDC High-Performance Big Data and Cloud Computing Teaching Cloud Computing: Motivations, Challenges and Tools Exploring Chapel Productivity Using Some Graph Algorithms