Parallel implementation of real-time semi-global matching on embedded multi-core architectures

Oliver Jakob Arndt, Daniel Becker, C. Banz, H. Blume
{"title":"Parallel implementation of real-time semi-global matching on embedded multi-core architectures","authors":"Oliver Jakob Arndt, Daniel Becker, C. Banz, H. Blume","doi":"10.1109/SAMOS.2013.6621106","DOIUrl":null,"url":null,"abstract":"Embedded real-time algorithms are often realized with dedicated hardware, exhibiting high production costs and low programming flexibility thereafter. For instance, semi-global matching for stereo image processing, including complex data flows, traditionally runs on customized hardware modules. Combining the processing and memory capabilities of multiple individual cores, emerging embedded multi-core technologies address these problems. However, considering concurrency issues (e.g., data races and lock contentions), parallel programming requires experienced programmers and technology-specific techniques (e.g., synchronization libraries) and tools (e.g., parallel profilers), which are often not available on embedded platforms. In this work, we introduce a parallel version of a semi-global matching algorithm and demonstrate within this case study runtime optimizations necessary to meet real-time requirements. We also show structured steps of the applied parallelization workflow, illustrating an efficient migration strategy to multi-core platforms using runtime information (e.g., profiles and hardware counters). Finally, to evaluate the resulting performance characteristics, we compare the runtime behavior of the parallel version running on a Freescale P4080 processor with reference values taken on an Intel i7, a field-programmable logic device, an extended general purpose processor and a GPU.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SAMOS.2013.6621106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23

Abstract

Embedded real-time algorithms are often realized with dedicated hardware, exhibiting high production costs and low programming flexibility thereafter. For instance, semi-global matching for stereo image processing, including complex data flows, traditionally runs on customized hardware modules. Combining the processing and memory capabilities of multiple individual cores, emerging embedded multi-core technologies address these problems. However, considering concurrency issues (e.g., data races and lock contentions), parallel programming requires experienced programmers and technology-specific techniques (e.g., synchronization libraries) and tools (e.g., parallel profilers), which are often not available on embedded platforms. In this work, we introduce a parallel version of a semi-global matching algorithm and demonstrate within this case study runtime optimizations necessary to meet real-time requirements. We also show structured steps of the applied parallelization workflow, illustrating an efficient migration strategy to multi-core platforms using runtime information (e.g., profiles and hardware counters). Finally, to evaluate the resulting performance characteristics, we compare the runtime behavior of the parallel version running on a Freescale P4080 processor with reference values taken on an Intel i7, a field-programmable logic device, an extended general purpose processor and a GPU.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
嵌入式多核架构下实时半全局匹配的并行实现
嵌入式实时算法通常使用专用硬件来实现,因此制作成本高,编程灵活性低。例如,包括复杂数据流在内的立体图像处理的半全局匹配,传统上是在定制的硬件模块上运行的。新兴的嵌入式多核技术结合了多个单独核心的处理和存储能力,解决了这些问题。然而,考虑到并发性问题(例如,数据竞争和锁争用),并行编程需要有经验的程序员和特定于技术的技术(例如,同步库)和工具(例如,并行分析器),这些在嵌入式平台上通常是不可用的。在这项工作中,我们介绍了半全局匹配算法的并行版本,并在本案例研究中演示了满足实时需求所需的运行时优化。我们还展示了应用并行化工作流的结构化步骤,说明了使用运行时信息(例如,配置文件和硬件计数器)向多核平台的有效迁移策略。最后,为了评估由此产生的性能特征,我们比较了运行在飞思卡尔P4080处理器上的并行版本的运行时行为与在Intel i7(现场可编程逻辑设备)、扩展通用处理器和GPU上的参考值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Workload-dependent relative fault sensitivity and error contribution factor of GPU onchip memory structures TimeCube: A manycore embedded processor with interference-agnostic progress tracking An effective model extraction method with state space compression for model checking SystemC TLM designs A just-in-time modulo scheduling for virtual coarse-grained reconfigurable architectures An embedded hardware-efficient architecture for real-time cascade Support Vector Machine classification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1