Oliver Jakob Arndt, Daniel Becker, C. Banz, H. Blume
{"title":"Parallel implementation of real-time semi-global matching on embedded multi-core architectures","authors":"Oliver Jakob Arndt, Daniel Becker, C. Banz, H. Blume","doi":"10.1109/SAMOS.2013.6621106","DOIUrl":null,"url":null,"abstract":"Embedded real-time algorithms are often realized with dedicated hardware, exhibiting high production costs and low programming flexibility thereafter. For instance, semi-global matching for stereo image processing, including complex data flows, traditionally runs on customized hardware modules. Combining the processing and memory capabilities of multiple individual cores, emerging embedded multi-core technologies address these problems. However, considering concurrency issues (e.g., data races and lock contentions), parallel programming requires experienced programmers and technology-specific techniques (e.g., synchronization libraries) and tools (e.g., parallel profilers), which are often not available on embedded platforms. In this work, we introduce a parallel version of a semi-global matching algorithm and demonstrate within this case study runtime optimizations necessary to meet real-time requirements. We also show structured steps of the applied parallelization workflow, illustrating an efficient migration strategy to multi-core platforms using runtime information (e.g., profiles and hardware counters). Finally, to evaluate the resulting performance characteristics, we compare the runtime behavior of the parallel version running on a Freescale P4080 processor with reference values taken on an Intel i7, a field-programmable logic device, an extended general purpose processor and a GPU.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SAMOS.2013.6621106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23
Abstract
Embedded real-time algorithms are often realized with dedicated hardware, exhibiting high production costs and low programming flexibility thereafter. For instance, semi-global matching for stereo image processing, including complex data flows, traditionally runs on customized hardware modules. Combining the processing and memory capabilities of multiple individual cores, emerging embedded multi-core technologies address these problems. However, considering concurrency issues (e.g., data races and lock contentions), parallel programming requires experienced programmers and technology-specific techniques (e.g., synchronization libraries) and tools (e.g., parallel profilers), which are often not available on embedded platforms. In this work, we introduce a parallel version of a semi-global matching algorithm and demonstrate within this case study runtime optimizations necessary to meet real-time requirements. We also show structured steps of the applied parallelization workflow, illustrating an efficient migration strategy to multi-core platforms using runtime information (e.g., profiles and hardware counters). Finally, to evaluate the resulting performance characteristics, we compare the runtime behavior of the parallel version running on a Freescale P4080 processor with reference values taken on an Intel i7, a field-programmable logic device, an extended general purpose processor and a GPU.