首页 > 最新文献

Proceedings of the 2020 Sixth Workshop on Programming Models for SIMD/Vector Processing最新文献

英文 中文
Scheduling Irregular Dataflow Pipelines on SIMD Architectures SIMD架构下的不规则数据流管道调度
Tom Plano, J. Buhler
Streaming computations often exhibit substantial data parallelism that makes them well-suited to SIMD architectures. However, many such computations also exhibit irregularity, in the form of data-dependent, dynamic data rates, that makes efficient SIMD execution challenging. One aspect of this challenge is the need to schedule execution of a computation realized as a pipeline of stages connected by finite queues. A scheduler must both ensure high SIMD occupancy by gathering queued items into vectors and minimize costs associated with switching execution between stages. In this work, we present the AFIE (Active Full, Inactive Empty) scheduling policy for irregular streaming applications on SIMD processors. AFIE provably groups inputs to each stage of a pipeline into a minimal number of SIMD vectors while incurring a bounded number of switches relative to the best possible policy. These results apply even though irregularity forbids a priori knowledge of how many outputs will be generated from each input to each stage. We have implemented AFIE as an extension to the MERCATOR system [6] for building irregular streaming applications on NVIDIA GPUs. We describe how the AFIE scheduler simplifies MERCATOR's runtime code and empirically measure the new scheduler's improved performance on irregular streaming applications.
流计算通常表现出大量的数据并行性,这使得它们非常适合SIMD体系结构。然而,许多这样的计算也表现出不规则性,以数据依赖的动态数据速率的形式,这使得有效的SIMD执行变得困难。这一挑战的一个方面是需要调度计算的执行,计算是由有限队列连接的阶段管道实现的。调度器必须通过将排队项收集到向量中来确保较高的SIMD占用率,并最小化与在阶段之间切换执行相关的成本。在这项工作中,我们提出了针对SIMD处理器上的不规则流应用程序的AFIE (Active Full, Inactive Empty)调度策略。可以证明,AFIE将管道每个阶段的输入分组为最小数量的SIMD向量,同时相对于最佳策略产生有限数量的开关。这些结果适用于即使不规则性禁止从每个输入到每个阶段将产生多少输出的先验知识。我们已经实现了AFIE作为MERCATOR系统的扩展[6],用于在NVIDIA gpu上构建不规则流媒体应用程序。我们描述了AFIE调度器如何简化MERCATOR的运行时代码,并经验地测量了新调度器在不规则流应用程序上的改进性能。
{"title":"Scheduling Irregular Dataflow Pipelines on SIMD Architectures","authors":"Tom Plano, J. Buhler","doi":"10.1145/3380479.3380480","DOIUrl":"https://doi.org/10.1145/3380479.3380480","url":null,"abstract":"Streaming computations often exhibit substantial data parallelism that makes them well-suited to SIMD architectures. However, many such computations also exhibit irregularity, in the form of data-dependent, dynamic data rates, that makes efficient SIMD execution challenging. One aspect of this challenge is the need to schedule execution of a computation realized as a pipeline of stages connected by finite queues. A scheduler must both ensure high SIMD occupancy by gathering queued items into vectors and minimize costs associated with switching execution between stages. In this work, we present the AFIE (Active Full, Inactive Empty) scheduling policy for irregular streaming applications on SIMD processors. AFIE provably groups inputs to each stage of a pipeline into a minimal number of SIMD vectors while incurring a bounded number of switches relative to the best possible policy. These results apply even though irregularity forbids a priori knowledge of how many outputs will be generated from each input to each stage. We have implemented AFIE as an extension to the MERCATOR system [6] for building irregular streaming applications on NVIDIA GPUs. We describe how the AFIE scheduler simplifies MERCATOR's runtime code and empirically measure the new scheduler's improved performance on irregular streaming applications.","PeriodicalId":164160,"journal":{"name":"Proceedings of the 2020 Sixth Workshop on Programming Models for SIMD/Vector Processing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125108499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
SIMD-based Exact Parallel Fuzzy Dilation Operator for Fast Computing of Fuzzy Spatial Relations 基于simd的模糊空间关系精确并行扩张算子
Régis Pierrard, Laurent Cabaret, Jean-Philippe Poli, C. Hudelot
For decades, fuzzy spatial relations have demonstrated their utility and effectiveness for visual reasoning, including semantic annotation and object recognition. However, a major issue is that they often involve fuzzy morphological operators that are compute-intensive leading to long latency in the relation evaluation. As a result, approximate methods have been proposed to compute some relations in an acceptable time, but they are not as generic as the fuzzy dilation or do not make the most of modern computing architectures. In this paper, we introduce the Reverse and the Parallel Reverse (PR) algorithms. Reverse is an exact and efficient algorithm for the fuzzy dilation operator and PR combines the Reverse algorithm exactness with efficient usage of modern-processor multiple cores using OpenMP. Using SIMD extensions to enhance Parallel Reverse, PR128 (AVX), PR256 (AVX2), and PR512 (AVX512) are faster than the state-of-the-art approximate methods while remaining generic and exact. To demonstrate the performance of PR and highlight the contribution of the SIMD instructions, an extensive benchmark was carried out on two datasets of natural and artificial images.
几十年来,模糊空间关系在视觉推理(包括语义注释和对象识别)方面已经证明了其实用性和有效性。然而,一个主要的问题是,它们通常涉及模糊形态学运算符,这些运算符是计算密集型的,导致关系评估的长延迟。因此,人们提出了在可接受的时间内计算某些关系的近似方法,但它们不像模糊扩展那样通用或不能充分利用现代计算体系结构。本文介绍了反向和并行反向(PR)算法。反向算法是一种精确而高效的模糊展开算子算法,PR将反向算法的准确性与使用OpenMP的现代处理器多核的有效利用相结合。使用SIMD扩展来增强并行反向,PR128 (AVX), PR256 (AVX2)和PR512 (AVX512)比最先进的近似方法更快,同时保持通用和精确。为了展示PR的性能并突出SIMD指令的贡献,在自然和人工图像两个数据集上进行了广泛的基准测试。
{"title":"SIMD-based Exact Parallel Fuzzy Dilation Operator for Fast Computing of Fuzzy Spatial Relations","authors":"Régis Pierrard, Laurent Cabaret, Jean-Philippe Poli, C. Hudelot","doi":"10.1145/3380479.3380482","DOIUrl":"https://doi.org/10.1145/3380479.3380482","url":null,"abstract":"For decades, fuzzy spatial relations have demonstrated their utility and effectiveness for visual reasoning, including semantic annotation and object recognition. However, a major issue is that they often involve fuzzy morphological operators that are compute-intensive leading to long latency in the relation evaluation. As a result, approximate methods have been proposed to compute some relations in an acceptable time, but they are not as generic as the fuzzy dilation or do not make the most of modern computing architectures. In this paper, we introduce the Reverse and the Parallel Reverse (PR) algorithms. Reverse is an exact and efficient algorithm for the fuzzy dilation operator and PR combines the Reverse algorithm exactness with efficient usage of modern-processor multiple cores using OpenMP. Using SIMD extensions to enhance Parallel Reverse, PR128 (AVX), PR256 (AVX2), and PR512 (AVX512) are faster than the state-of-the-art approximate methods while remaining generic and exact. To demonstrate the performance of PR and highlight the contribution of the SIMD instructions, an extensive benchmark was carried out on two datasets of natural and artificial images.","PeriodicalId":164160,"journal":{"name":"Proceedings of the 2020 Sixth Workshop on Programming Models for SIMD/Vector Processing","volume":"326 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124297867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
How to speed Connected Component Labeling up with SIMD RLE algorithms 如何加速连接组件标记与SIMD RLE算法
F. Lemaitre, A. Hennequin, L. Lacassagne
The research in Connected Component Labeling, although old, is still very active and several efficient algorithms for CPUs and GPUs have emerged during the last years and are always improving the performance. This article introduces a new SIMD run-based algorithm for CCL. We show how RLE compression can be SIMDized and used to accelerate scalar run-based CCL algorithms. A benchmark done on Intel, AMD and ARM processors shows that this new algorithm outperforms the State-of-the-Art by an average factor of x1.7 on AVX2 machines and x1.9 on Intel Xeon Skylake with AVX512.
互联元件标注的研究虽然由来已久,但仍然非常活跃,近年来出现了几种针对cpu和gpu的高效算法,并不断提高性能。本文介绍了一种新的基于SIMD运行的CCL算法。我们展示了如何将RLE压缩进行SIMDized并用于加速基于标量运行的CCL算法。在英特尔、AMD和ARM处理器上进行的基准测试表明,这种新算法在AVX2机器上的平均性能是x1.7倍,在AVX512的英特尔至强Skylake上的平均性能是x1.9倍。
{"title":"How to speed Connected Component Labeling up with SIMD RLE algorithms","authors":"F. Lemaitre, A. Hennequin, L. Lacassagne","doi":"10.1145/3380479.3380481","DOIUrl":"https://doi.org/10.1145/3380479.3380481","url":null,"abstract":"The research in Connected Component Labeling, although old, is still very active and several efficient algorithms for CPUs and GPUs have emerged during the last years and are always improving the performance. This article introduces a new SIMD run-based algorithm for CCL. We show how RLE compression can be SIMDized and used to accelerate scalar run-based CCL algorithms. A benchmark done on Intel, AMD and ARM processors shows that this new algorithm outperforms the State-of-the-Art by an average factor of x1.7 on AVX2 machines and x1.9 on Intel Xeon Skylake with AVX512.","PeriodicalId":164160,"journal":{"name":"Proceedings of the 2020 Sixth Workshop on Programming Models for SIMD/Vector Processing","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128579880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
Proceedings of the 2020 Sixth Workshop on Programming Models for SIMD/Vector Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1