{"title":"DARTS: Techniques and Tools for Predictably Fast Memory Using Integrated Data Allocation and Real-Time Task Scheduling","authors":"Sangyeol Kang, A. Dean","doi":"10.1109/RTAS.2010.36","DOIUrl":null,"url":null,"abstract":"Hardware-managed caches introduce large amounts of timing variability, complicating real-time system design. One alternative is a memory system with scratchpad memories which improve system performance while eliminating such timing variability. Prior work introduced the DARTS approach, which combines static allocation of data into scratchpad memories, with task scheduling for preemptive multi-threaded, hard real-time embedded systems.This study offers several significant contributions. First, it introduces a method to split a stack frame across multiple memory units, offering fine-grain allocation of automatic memory variables with very low run-time overhead. This enables more effective use of fast memory, improving run-times. Second, it introduces the completed tool-chain based on DARTS, which reallocates static and automatic variables across multiple memory banks and now targets the ARM7 architecture. Third, it evaluates the performance improvement from DARTS using experimental results from the code running on real hardware in a preemptively scheduled RTOS-based multi-tasking environment. This hands-on experimental approach ensures a high level of confidence in the results; previous studies have generally stopped at estimating performance rather than building and measuring a real implementation.In our experiments the execution time of each task is reduced up to 24% from the baseline external SRAM configurations. We show that our methods improve task execution time to achieve 37% to 99% of the performance improvement of an ideal unlimited-capacity scratchpad memory system. Finally, we find our allocations provide on average 2/3 of the performance enhancement of the equivalently-sized cache yet with easily-predicted performance.","PeriodicalId":356388,"journal":{"name":"2010 16th IEEE Real-Time and Embedded Technology and Applications Symposium","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 16th IEEE Real-Time and Embedded Technology and Applications Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RTAS.2010.36","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
Hardware-managed caches introduce large amounts of timing variability, complicating real-time system design. One alternative is a memory system with scratchpad memories which improve system performance while eliminating such timing variability. Prior work introduced the DARTS approach, which combines static allocation of data into scratchpad memories, with task scheduling for preemptive multi-threaded, hard real-time embedded systems.This study offers several significant contributions. First, it introduces a method to split a stack frame across multiple memory units, offering fine-grain allocation of automatic memory variables with very low run-time overhead. This enables more effective use of fast memory, improving run-times. Second, it introduces the completed tool-chain based on DARTS, which reallocates static and automatic variables across multiple memory banks and now targets the ARM7 architecture. Third, it evaluates the performance improvement from DARTS using experimental results from the code running on real hardware in a preemptively scheduled RTOS-based multi-tasking environment. This hands-on experimental approach ensures a high level of confidence in the results; previous studies have generally stopped at estimating performance rather than building and measuring a real implementation.In our experiments the execution time of each task is reduced up to 24% from the baseline external SRAM configurations. We show that our methods improve task execution time to achieve 37% to 99% of the performance improvement of an ideal unlimited-capacity scratchpad memory system. Finally, we find our allocations provide on average 2/3 of the performance enhancement of the equivalently-sized cache yet with easily-predicted performance.