Stream processing applications, and in particular Software Defined Radio applications, are typically executed on multi-core systems. Such applications often have real-time throughput constraints. Automatic parallelization of Nested Loop Programs (NLPs) is an attractive method to create embedded real-time stream processing applications for multi-core systems [1]. However, the description and parallelization of applications with a time dependent functional behavior has not been considered for NLPs. In such a description, semantic information about time dependent behavior must be made available for the compiler, such that an optimized time independent implementation can be generated automatically.
{"title":"Sequential specification of time-aware stream processing applications (Extended abstract)","authors":"Stefan J. Geuns, J. Hausmans, M. Bekooij","doi":"10.1145/2435227.2435231","DOIUrl":"https://doi.org/10.1145/2435227.2435231","url":null,"abstract":"Stream processing applications, and in particular Software Defined Radio applications, are typically executed on multi-core systems. Such applications often have real-time throughput constraints. Automatic parallelization of Nested Loop Programs (NLPs) is an attractive method to create embedded real-time stream processing applications for multi-core systems [1]. However, the description and parallelization of applications with a time dependent functional behavior has not been considered for NLPs. In such a description, semantic information about time dependent behavior must be made available for the compiler, such that an optimized time independent implementation can be generated automatically.","PeriodicalId":431615,"journal":{"name":"2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116682069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-01DOI: 10.1109/ESTIMedia.2012.6507035
Y. Wen, Ziyi Liu, W. Shi, Yifei Jiang, A. Cheng, Feng Yang, Abhinav Kohar
Mobile devices, such as smartphones, e-books, and tablets, have limited battery capability because of the constraint of battery size and mobility requirement. However the large color displays on those devices put more tensions on this situation as the displays consume a large portion of the total battery power. A TOLED-EPD hybrid display that integrates a transparent OLED (TOLED) with an electrophoretic display (EPD) has been emerging to reduce the energy usage of displays. The technology displays information selectively on one of the displays based on the update rate of content, thus reduces the energy usage. In this paper, we propose a design of mobile video playback, Decoder4Hybrid, for the hybrid displays. The proposed approach supports encoded video playback based on the update frequency of each block, which is exploited by the hybrid display controller to determine which display should be used to show a MPEG encoded block. A fast DCT-based heuristic algorithm is proposed to detect the changes between frames at block level with minimal computation cost. Experimental results show that the proposed approach can save up to 40% power with acceptable video quality.
{"title":"Support for power efficient mobile video playback on simultaneous hybrid display","authors":"Y. Wen, Ziyi Liu, W. Shi, Yifei Jiang, A. Cheng, Feng Yang, Abhinav Kohar","doi":"10.1109/ESTIMedia.2012.6507035","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2012.6507035","url":null,"abstract":"Mobile devices, such as smartphones, e-books, and tablets, have limited battery capability because of the constraint of battery size and mobility requirement. However the large color displays on those devices put more tensions on this situation as the displays consume a large portion of the total battery power. A TOLED-EPD hybrid display that integrates a transparent OLED (TOLED) with an electrophoretic display (EPD) has been emerging to reduce the energy usage of displays. The technology displays information selectively on one of the displays based on the update rate of content, thus reduces the energy usage. In this paper, we propose a design of mobile video playback, Decoder4Hybrid, for the hybrid displays. The proposed approach supports encoded video playback based on the update frequency of each block, which is exploited by the hybrid display controller to determine which display should be used to show a MPEG encoded block. A fast DCT-based heuristic algorithm is proposed to detect the changes between frames at block level with minimal computation cost. Experimental results show that the proposed approach can save up to 40% power with acceptable video quality.","PeriodicalId":431615,"journal":{"name":"2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131498569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a lifetime aware buffer assignment method for streaming applications like multimedia specified in a synchronous dataflow (SDF) graph on a DRAM/PRAM hybrid memory in which the endurance of PRAM is limited. We determine whether buffers are assigned to DRAM or PRAM to minimize the writing frequency of PRAM. To solve the problems, we formulate them using Answer Set Programming(ASP). Experimental results show that the proposed approach increases the PRAM lifetime by 63% compared with no optimization, and shows the tradeoff between PRAM and DRAM size to guarantee a lifetime constraint.
{"title":"A lifetime aware buffer assignment method for streaming applications on DRAM/PRAM hybrid memory (Extended abstract)","authors":"Daeyoung Lee, Hyunok Oh","doi":"10.1145/2435227.2435232","DOIUrl":"https://doi.org/10.1145/2435227.2435232","url":null,"abstract":"This paper proposes a lifetime aware buffer assignment method for streaming applications like multimedia specified in a synchronous dataflow (SDF) graph on a DRAM/PRAM hybrid memory in which the endurance of PRAM is limited. We determine whether buffers are assigned to DRAM or PRAM to minimize the writing frequency of PRAM. To solve the problems, we formulate them using Answer Set Programming(ASP). Experimental results show that the proposed approach increases the PRAM lifetime by 63% compared with no optimization, and shows the tradeoff between PRAM and DRAM size to guarantee a lifetime constraint.","PeriodicalId":431615,"journal":{"name":"2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126253980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-01DOI: 10.1109/ESTIMedia.2012.6507022
Liviu Codrut Stancu, L. A. Bathen, N. Dutt, A. Nicolau
Adopting emerging non-volatile memory (NVM) technologies is a viable solution to minimize the increasing memory leakage power in today's embedded systems. However, in order to take advantage of the many benefits in NVMs, software must account for their high write overheads. This paper presents AVid, an annotation driven video decoding technique for hybrid memory subsystems. AVid exploits the physical characteristics of NVMs by extracting video decoder access patterns and uses this meta-information to minimize write overheads, thereby improving energy savings and performance. Our experimental results on an annotation-aware H.264 codec show that our technique is able to achieve execution time and energy reduction by up to 40.8% and 39.7% respectively when applied to H.264 decoding.
{"title":"AVid: Annotation driven video decoding for hybrid memories","authors":"Liviu Codrut Stancu, L. A. Bathen, N. Dutt, A. Nicolau","doi":"10.1109/ESTIMedia.2012.6507022","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2012.6507022","url":null,"abstract":"Adopting emerging non-volatile memory (NVM) technologies is a viable solution to minimize the increasing memory leakage power in today's embedded systems. However, in order to take advantage of the many benefits in NVMs, software must account for their high write overheads. This paper presents AVid, an annotation driven video decoding technique for hybrid memory subsystems. AVid exploits the physical characteristics of NVMs by extracting video decoder access patterns and uses this meta-information to minimize write overheads, thereby improving energy savings and performance. Our experimental results on an annotation-aware H.264 codec show that our technique is able to achieve execution time and energy reduction by up to 40.8% and 39.7% respectively when applied to H.264 decoding.","PeriodicalId":431615,"journal":{"name":"2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia","volume":"668 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116101704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-01DOI: 10.1109/ESTIMedia.2012.6507026
Shin-Haeng Kang, Hoeseok Yang, Lars Schor, Iuliana Bacivarov, S. Ha, L. Thiele
Due to the trend of many-core systems for dynamic multimedia applications, the problem size of mapping optimization gets bigger than ever making conventional meta-heuristics no longer effective. Thus, in this paper, we propose a problem decomposition approach for large scale optimization problems. We basically follow the divide-and-conquer concept, in which a large scale problem is divided into several sub-problems. To remove the inter-relationship between sub-problems, proper abstraction is applied. The divided sub-problems can be solved either in parallel or in a sequence. The mapping optimization problem on dynamic many-core systems is decomposed and solved separately considering the system state and architectural hierarchy. Experimental evaluations with several examples prove that the proposed technique outperforms the conventional meta-heuristics both in optimality and diversity of the optimized pareto curve.
{"title":"Multi-objective mapping optimization via problem decomposition for many-core systems","authors":"Shin-Haeng Kang, Hoeseok Yang, Lars Schor, Iuliana Bacivarov, S. Ha, L. Thiele","doi":"10.1109/ESTIMedia.2012.6507026","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2012.6507026","url":null,"abstract":"Due to the trend of many-core systems for dynamic multimedia applications, the problem size of mapping optimization gets bigger than ever making conventional meta-heuristics no longer effective. Thus, in this paper, we propose a problem decomposition approach for large scale optimization problems. We basically follow the divide-and-conquer concept, in which a large scale problem is divided into several sub-problems. To remove the inter-relationship between sub-problems, proper abstraction is applied. The divided sub-problems can be solved either in parallel or in a sequence. The mapping optimization problem on dynamic many-core systems is decomposed and solved separately considering the system state and architectural hierarchy. Experimental evaluations with several examples prove that the proposed technique outperforms the conventional meta-heuristics both in optimality and diversity of the optimized pareto curve.","PeriodicalId":431615,"journal":{"name":"2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123870840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-01DOI: 10.1109/ESTIMedia.2012.6507017
V. Zaccaria
It is widely understood that the next revolution of virtual platform-based design is the holistic optimization of hardware parameters, task mapping and scheduling, and application tuning for many-cores. As a community, we have learned that finding the best trade-off in terms of selected figures of merit can be achieved only by considering the integrated hardware and software dimensions, by evaluating an enormous number of configurations, each characterized by a long simulation time. The problem worsens when dealing with small ecosystems such as embedded systems-on-chip where the environment is too constrained to assume that a sophisticated run-time algorithm can be implemented to schedule efficiently the access to resources. In this keynote I will explain why some newest findings in global optimization can be used to address effectively virtual-platform design. I am going then to describe how sophisticated algorithms based on response-surface prediction models can efficiently identify optimal configurations in a significant number of platform optimization scenarios. Finally, I am going to outline some research directions stemming from the MULTICUBE and 2PARMA EU projects that I think will untap the full potential of platform optimization.
{"title":"Keynote: “Design space exploration and run-time resource management in the embedded multi-core era”","authors":"V. Zaccaria","doi":"10.1109/ESTIMedia.2012.6507017","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2012.6507017","url":null,"abstract":"It is widely understood that the next revolution of virtual platform-based design is the holistic optimization of hardware parameters, task mapping and scheduling, and application tuning for many-cores. As a community, we have learned that finding the best trade-off in terms of selected figures of merit can be achieved only by considering the integrated hardware and software dimensions, by evaluating an enormous number of configurations, each characterized by a long simulation time. The problem worsens when dealing with small ecosystems such as embedded systems-on-chip where the environment is too constrained to assume that a sophisticated run-time algorithm can be implemented to schedule efficiently the access to resources. In this keynote I will explain why some newest findings in global optimization can be used to address effectively virtual-platform design. I am going then to describe how sophisticated algorithms based on response-surface prediction models can efficiently identify optimal configurations in a significant number of platform optimization scenarios. Finally, I am going to outline some research directions stemming from the MULTICUBE and 2PARMA EU projects that I think will untap the full potential of platform optimization.","PeriodicalId":431615,"journal":{"name":"2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia","volume":"149 15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129942174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-01DOI: 10.1109/ESTIMedia.2012.6507030
E. Aho, Kimmo Kuusilinna, T. Aarnio, Janne Pietiainen, Jari Nikara
WebGL and WebCL are web targeted versions of OpenGL ES and OpenCL standards. Using these standards, it is possible to better exploit the hardware resources in embedded systems from web browsers allowing timely processing of audio, video, and graphics. WebGL excels in graphics applications while WebCL fares better when more flexibility is required in execution platform selection, load balancing, data formats, control flow, or memory access patterns. This paper explores the potential for mobile web application acceleration utilizing WebGL and particularly WebCL which is currently under intense development. Where driver support is lacking, WebGL is used as a proxy to provide an estimate of WebCL opportunity. Speedups in the order of 200x over JavaScript are demonstrated in best case situations for a GPU target. In similar situations, CPU acceleration can be 10x while running in a laptop browser. In addition, as building and optimizing a WebCL implementation is part of the reported work, an overview of the important development issues is given.
{"title":"Towards real-time applications in mobile web browsers","authors":"E. Aho, Kimmo Kuusilinna, T. Aarnio, Janne Pietiainen, Jari Nikara","doi":"10.1109/ESTIMedia.2012.6507030","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2012.6507030","url":null,"abstract":"WebGL and WebCL are web targeted versions of OpenGL ES and OpenCL standards. Using these standards, it is possible to better exploit the hardware resources in embedded systems from web browsers allowing timely processing of audio, video, and graphics. WebGL excels in graphics applications while WebCL fares better when more flexibility is required in execution platform selection, load balancing, data formats, control flow, or memory access patterns. This paper explores the potential for mobile web application acceleration utilizing WebGL and particularly WebCL which is currently under intense development. Where driver support is lacking, WebGL is used as a proxy to provide an estimate of WebCL opportunity. Speedups in the order of 200x over JavaScript are demonstrated in best case situations for a GPU target. In similar situations, CPU acceleration can be 10x while running in a laptop browser. In addition, as building and optimizing a WebCL implementation is part of the reported work, an overview of the important development issues is given.","PeriodicalId":431615,"journal":{"name":"2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116575818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-01DOI: 10.1109/ESTIMedia.2012.6507032
Andrew Nelson, B. Akesson, A. Molnos, Sj Pas, K. Goossens
Electronic devices are expected to accommodate evermore complex functionality. Portable devices, such as mobile phones, have experienced a rapid increase in functionality, while at the same time being constrained by the amount of energy that may be stored in their batteries. Dynamic Voltage and Frequency Scaling (DVFS) is a common technique that is used to trade processor speed for a reduction in power consumption. Adaptive applications can reduce their output quality in exchange for a reduction in their execution time. This exchange has been shown to be useful for meeting temporal constraints, but its usefulness for reducing energy/power consumption has not been investigated. In this paper, we present a technique that uses existing DVFS methods to trade a quality decrease for lower power/energy consumption through an intermediary reduction in execution time. Our technique achieves this while meeting soft and/or hard time/energy/power constraints. We demonstrate the applicability of our technique on an adaptive H.263 decoder application, running on a predictable hardware platform that is prototyped on an FPGA. We further contribute an experimental evaluation of the H.263 decoder's scalable mechanisms, in their ability to trade quality for temporal/energy/power. From experimentation, we show that our quality trading technique is able to achieve up to a 45% increase in the number of frames decoded for the same amount of energy, in comparison to frequency scaling alone, but with a quality reduction of up to 22dB Peak Signal-to-Noise Ratio (PSNR).
{"title":"Power versus quality trade-offs for adaptive real-time applications","authors":"Andrew Nelson, B. Akesson, A. Molnos, Sj Pas, K. Goossens","doi":"10.1109/ESTIMedia.2012.6507032","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2012.6507032","url":null,"abstract":"Electronic devices are expected to accommodate evermore complex functionality. Portable devices, such as mobile phones, have experienced a rapid increase in functionality, while at the same time being constrained by the amount of energy that may be stored in their batteries. Dynamic Voltage and Frequency Scaling (DVFS) is a common technique that is used to trade processor speed for a reduction in power consumption. Adaptive applications can reduce their output quality in exchange for a reduction in their execution time. This exchange has been shown to be useful for meeting temporal constraints, but its usefulness for reducing energy/power consumption has not been investigated. In this paper, we present a technique that uses existing DVFS methods to trade a quality decrease for lower power/energy consumption through an intermediary reduction in execution time. Our technique achieves this while meeting soft and/or hard time/energy/power constraints. We demonstrate the applicability of our technique on an adaptive H.263 decoder application, running on a predictable hardware platform that is prototyped on an FPGA. We further contribute an experimental evaluation of the H.263 decoder's scalable mechanisms, in their ability to trade quality for temporal/energy/power. From experimentation, we show that our quality trading technique is able to achieve up to a 45% increase in the number of frames decoded for the same amount of energy, in comparison to frequency scaling alone, but with a quality reduction of up to 22dB Peak Signal-to-Noise Ratio (PSNR).","PeriodicalId":431615,"journal":{"name":"2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116961484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-01DOI: 10.1109/ESTIMedia.2012.6507028
Kyungsoo Lee, T. Ishihara
The system efficiency using an energy generation source is important. The high efficiency can reduce the cost of the system or increase the lifetime of the system operation. The high efficiency can be achieved by a high generating efficiency, a high consumption efficiency or a high transferring efficiency. Conventional maximum power point tracking (MPPT) techniques and multi-core scheduling methods do not consider the transferring efficiency in the multiple load system. This paper presents a generalized technique for the task scheduling of a multi-core processor considering the transferring efficiency in multiple loads. The target system contains a functionality for dynamic reconfiguration of a photovoltaic/supercapacitor array to change the input voltage of DC-DC converters in multiple loads. The proposed technique minimizes the power loss in the DC-DC converters and charger of the system. Experiments with actual application demonstrate that our approach reduces the energy consumption by 17.7% over the conventional approach, which employs a dynamic voltage and frequency processor.
{"title":"I/O aware task scheduling for energy harvesting embedded systems with PV and capacitor arrays","authors":"Kyungsoo Lee, T. Ishihara","doi":"10.1109/ESTIMedia.2012.6507028","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2012.6507028","url":null,"abstract":"The system efficiency using an energy generation source is important. The high efficiency can reduce the cost of the system or increase the lifetime of the system operation. The high efficiency can be achieved by a high generating efficiency, a high consumption efficiency or a high transferring efficiency. Conventional maximum power point tracking (MPPT) techniques and multi-core scheduling methods do not consider the transferring efficiency in the multiple load system. This paper presents a generalized technique for the task scheduling of a multi-core processor considering the transferring efficiency in multiple loads. The target system contains a functionality for dynamic reconfiguration of a photovoltaic/supercapacitor array to change the input voltage of DC-DC converters in multiple loads. The proposed technique minimizes the power loss in the DC-DC converters and charger of the system. Experiments with actual application demonstrate that our approach reduces the energy consumption by 17.7% over the conventional approach, which employs a dynamic voltage and frequency processor.","PeriodicalId":431615,"journal":{"name":"2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127706296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-01DOI: 10.1109/ESTIMedia.2012.6507023
Matthew Milford, J. McAllister
Realising memory intensive applications such as image and video processing on FPGA requires creation of complex, multi-level memory hierarchies to achieve real-time performance; however commerical High Level Synthesis tools are unable to automatically derive such structures and hence are unable to meet the demanding bandwidth and capacity constraints of these applications. Current approaches to solving this problem can only derive either single-level memory structures or very deep, highly inefficient hierarchies, leading in either case to one or more of high implementation cost and low performance. This paper presents an enhancement to an existing MC-HLS synthesis approach which solves this problem; it exploits and eliminates data duplication at multiple levels levels of the generated hierarchy, leading to a reduction in the number of levels and ultimately higher performance, lower cost implementations. When applied to synthesis of C-based Motion Estimation, Matrix Multiplication and Sobel Edge Detection applications, this enables reductions in Block RAM and Look Up Table (LUT) cost of up to 25%, whilst simultaneously increasing throughput.
{"title":"Memory-centric VDF graph transformations for practical FPGA implementation","authors":"Matthew Milford, J. McAllister","doi":"10.1109/ESTIMedia.2012.6507023","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2012.6507023","url":null,"abstract":"Realising memory intensive applications such as image and video processing on FPGA requires creation of complex, multi-level memory hierarchies to achieve real-time performance; however commerical High Level Synthesis tools are unable to automatically derive such structures and hence are unable to meet the demanding bandwidth and capacity constraints of these applications. Current approaches to solving this problem can only derive either single-level memory structures or very deep, highly inefficient hierarchies, leading in either case to one or more of high implementation cost and low performance. This paper presents an enhancement to an existing MC-HLS synthesis approach which solves this problem; it exploits and eliminates data duplication at multiple levels levels of the generated hierarchy, leading to a reduction in the number of levels and ultimately higher performance, lower cost implementations. When applied to synthesis of C-based Motion Estimation, Matrix Multiplication and Sobel Edge Detection applications, this enables reductions in Block RAM and Look Up Table (LUT) cost of up to 25%, whilst simultaneously increasing throughput.","PeriodicalId":431615,"journal":{"name":"2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia","volume":"584 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132509814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}