Pub Date : 2011-12-01DOI: 10.1109/ESTIMedia.2011.6088516
D. Nadezhkin, T. Stefanov
The Process Networks (PNs) is a suitable parallel model of computation (MoC) used to specify embedded streaming applications in a parallel form facilitating the efficient mapping onto embedded parallel execution platforms. Unfortunately, specifying an application using a parallel MoC is very difficult and highly error-prone task. To overcome the associated difficulties, an automated procedure exists for derivation of a specific polyhedral process networks (PPN) from static affine nested loop programs (SANLPs). This procedure is implemented in the pn complier. However, there are many applications, e.g., multimedia applications, signal processing, etc., that have adaptive and dynamic behavior which can not be expressed as SANLPs. Therefore, in order to handle more dynamic applications, in this paper we address the important question whether we can relax some of the restrictions of the SANLPs while keeping the ability to perform compile-time analysis and to derive PPNs. Achieving this would significantly extend the range of applications that can be parallelized in an automated way. The main contribution of this paper is a first approach for automated translation of affine nested loops programs with while-loops into input-output equivalent PPNs.
{"title":"Automatic derivation of polyhedral process networks from while-loop affine programs","authors":"D. Nadezhkin, T. Stefanov","doi":"10.1109/ESTIMedia.2011.6088516","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2011.6088516","url":null,"abstract":"The Process Networks (PNs) is a suitable parallel model of computation (MoC) used to specify embedded streaming applications in a parallel form facilitating the efficient mapping onto embedded parallel execution platforms. Unfortunately, specifying an application using a parallel MoC is very difficult and highly error-prone task. To overcome the associated difficulties, an automated procedure exists for derivation of a specific polyhedral process networks (PPN) from static affine nested loop programs (SANLPs). This procedure is implemented in the pn complier. However, there are many applications, e.g., multimedia applications, signal processing, etc., that have adaptive and dynamic behavior which can not be expressed as SANLPs. Therefore, in order to handle more dynamic applications, in this paper we address the important question whether we can relax some of the restrictions of the SANLPs while keeping the ability to perform compile-time analysis and to derive PPNs. Achieving this would significantly extend the range of applications that can be parallelized in an automated way. The main contribution of this paper is a first approach for automated translation of affine nested loops programs with while-loops into input-output equivalent PPNs.","PeriodicalId":180192,"journal":{"name":"2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130353911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/ESTIMedia.2011.6088519
N. Fisher, Masud Ahmed
Real-time multimedia subsystems often require support for switching between different resource and application execution modes. To ensure that timing constraints are not violated during or after a subsystem changes mode, real-time schedulability analysis is required. However, existing time-efficient multi-mode schedulability analysis techniques for application-only mode changes are not appropriate for subsystems that require changes in the resource execution behavior (e.g., processors with dynamic power modes). Furthermore, all existing multi-mode schedulability analysis that handles both resource and application mode changes is highly exponential and not scalable for subsystems with a moderate or large number of modes. We address the lack of tractable schedulability analysis for such subsystems by proposing a model for characterizing multiple resource and application modes and by deriving a sufficient schedulability test that has pseudo-polynomial time complexity. Simulation results show that our proposed schedulability test, when compared with previously-proposed approaches, requires significantly less time and is just as precise.
{"title":"Tractable real-time schedulability analysis for mode changes under temporal isolation","authors":"N. Fisher, Masud Ahmed","doi":"10.1109/ESTIMedia.2011.6088519","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2011.6088519","url":null,"abstract":"Real-time multimedia subsystems often require support for switching between different resource and application execution modes. To ensure that timing constraints are not violated during or after a subsystem changes mode, real-time schedulability analysis is required. However, existing time-efficient multi-mode schedulability analysis techniques for application-only mode changes are not appropriate for subsystems that require changes in the resource execution behavior (e.g., processors with dynamic power modes). Furthermore, all existing multi-mode schedulability analysis that handles both resource and application mode changes is highly exponential and not scalable for subsystems with a moderate or large number of modes. We address the lack of tractable schedulability analysis for such subsystems by proposing a model for characterizing multiple resource and application modes and by deriving a sufficient schedulability test that has pseudo-polynomial time complexity. Simulation results show that our proposed schedulability test, when compared with previously-proposed approaches, requires significantly less time and is just as precise.","PeriodicalId":180192,"journal":{"name":"2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132677900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/ESTIMedia.2011.6088531
Chi-Bang Kuan, Shao-Chung Wang, Wen-Li Shih, Kun-Hsien Tsai, S. Lai, Jenq-Kuen Lee
Bokeh application presents the blur or the aesthetic quality of blurring in out-of-focus areas of an image. The out-of-focus effect of Bokeh results depends on accuracy of depth information and blurring effects produced by image postprocessing. To obtain accurate depth information, current stereo vision techniques however consume a huge amount of processing time. In this paper, we present a case study on parallelizing a Bokeh application on an embedded multicore platform, which features one MPU and one DSP sub-system consisting of two VLIW DSP processors. The Bokeh application employs a Belief Propagation method to obtain depth information of input images and uses the information to generate output images with out-of-focus effect. This study also illustrates how to deliver performance for applications on embedded multicore systems. To sustain heavy computation requirement of the stereo vision techniques, DSPs with their SIMD instructions are leveraged to exploit data parallelism in critical kernels. In addition, DMAs on the multicore system are also incorporated to facilitate data transmission between processors. The access to SIMD and DMAs is provided by two essential programming models we developed for embedded multicore systems. Our work also gives the firsthand experiences of how C++ classes and abstractions can be used to help parallelization of applications on embedded multicore DSP systems. Finally, in our experiments, we utilize DSPs, SIMD and DMAs to obtain performance for two key components of the Bokeh application with their speedups of 1.67 and 2.75, respectively.
{"title":"Parallelization of a Bokeh application on embedded multicore DSP systems","authors":"Chi-Bang Kuan, Shao-Chung Wang, Wen-Li Shih, Kun-Hsien Tsai, S. Lai, Jenq-Kuen Lee","doi":"10.1109/ESTIMedia.2011.6088531","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2011.6088531","url":null,"abstract":"Bokeh application presents the blur or the aesthetic quality of blurring in out-of-focus areas of an image. The out-of-focus effect of Bokeh results depends on accuracy of depth information and blurring effects produced by image postprocessing. To obtain accurate depth information, current stereo vision techniques however consume a huge amount of processing time. In this paper, we present a case study on parallelizing a Bokeh application on an embedded multicore platform, which features one MPU and one DSP sub-system consisting of two VLIW DSP processors. The Bokeh application employs a Belief Propagation method to obtain depth information of input images and uses the information to generate output images with out-of-focus effect. This study also illustrates how to deliver performance for applications on embedded multicore systems. To sustain heavy computation requirement of the stereo vision techniques, DSPs with their SIMD instructions are leveraged to exploit data parallelism in critical kernels. In addition, DMAs on the multicore system are also incorporated to facilitate data transmission between processors. The access to SIMD and DMAs is provided by two essential programming models we developed for embedded multicore systems. Our work also gives the firsthand experiences of how C++ classes and abstractions can be used to help parallelization of applications on embedded multicore DSP systems. Finally, in our experiments, we utilize DSPs, SIMD and DMAs to obtain performance for two key components of the Bokeh application with their speedups of 1.67 and 2.75, respectively.","PeriodicalId":180192,"journal":{"name":"2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128300006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/ESTIMedia.2011.6088530
W. Che, Karam S. Chatha
Scratch Pad Memories (SPM) have emerged as an alternative to caches in embedded processor architectures due to their lower power consumption, smaller chip area and superior performance. However, the advantages of SPM come at the expense of increased load on the programmer as she is responsible for memory management. Consequently, there is a need for novel compilation for mapping applications onto SPM enhanced embedded processors. Stream programs (that describe a large class of embedded applications) demonstrate stable memory access patterns, and are particularly suitable for SPM based processors. In this paper we present a heuristic approach for scheduling and compiling streaming applications (modeled by synchronous data flow graphs) for SPM enhanced processors. The technique maximizes the application performance by minimizing code overlay overheads that are introduced when executing a large code base on a smaller sized SPM. We also present an extension of our approach that further reduces the overheads by selective code pre-fetching. The effectiveness of our approaches is evaluated by compiling ten streaming application onto one Synergistic Processing Engine (SPE) of the IBM Cell processor.
{"title":"Scheduling of stream programs onto SPM enhanced processors with code overlay","authors":"W. Che, Karam S. Chatha","doi":"10.1109/ESTIMedia.2011.6088530","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2011.6088530","url":null,"abstract":"Scratch Pad Memories (SPM) have emerged as an alternative to caches in embedded processor architectures due to their lower power consumption, smaller chip area and superior performance. However, the advantages of SPM come at the expense of increased load on the programmer as she is responsible for memory management. Consequently, there is a need for novel compilation for mapping applications onto SPM enhanced embedded processors. Stream programs (that describe a large class of embedded applications) demonstrate stable memory access patterns, and are particularly suitable for SPM based processors. In this paper we present a heuristic approach for scheduling and compiling streaming applications (modeled by synchronous data flow graphs) for SPM enhanced processors. The technique maximizes the application performance by minimizing code overlay overheads that are introduced when executing a large code base on a smaller sized SPM. We also present an extension of our approach that further reduces the overheads by selective code pre-fetching. The effectiveness of our approaches is evaluated by compiling ten streaming application onto one Synergistic Processing Engine (SPE) of the IBM Cell processor.","PeriodicalId":180192,"journal":{"name":"2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121333999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/ESTIMedia.2011.6088528
Yi-Hsuan Chiang, Polly Huang, Homer H. Chen
Multi-rate video scalable codecs, SVC and MDC, provide as plausible solutions to deal with heterogeneous environment of Internet. They, however, also give rise to a wide debate over which one is more efficient supporting P2P IPTV systems. Our goal in this work is to resolve the debate by providing a quantitative comparison of P2P IPTV systems given different choices of coding schemes and P2P network formations. The answer is rather subtle. MDC-based systems, though outperform SVC-based ones under certain network formation with bottleneck in terms of network throughput, suffer from a lower level of perceptual quality in terms of PSNR due to the coding inefficiency. The results drawn from this paper can be provided not only a lesson to the design of large-scale heterogeneous P2P IPTV systems but also as a strong evidence that a poor choice of codec at the higher level might over shadow the network-level designs and the codec and network formation components ought to be co-designed for optimal user experience.
{"title":"SVC or MDC? That's the question","authors":"Yi-Hsuan Chiang, Polly Huang, Homer H. Chen","doi":"10.1109/ESTIMedia.2011.6088528","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2011.6088528","url":null,"abstract":"Multi-rate video scalable codecs, SVC and MDC, provide as plausible solutions to deal with heterogeneous environment of Internet. They, however, also give rise to a wide debate over which one is more efficient supporting P2P IPTV systems. Our goal in this work is to resolve the debate by providing a quantitative comparison of P2P IPTV systems given different choices of coding schemes and P2P network formations. The answer is rather subtle. MDC-based systems, though outperform SVC-based ones under certain network formation with bottleneck in terms of network throughput, suffer from a lower level of perceptual quality in terms of PSNR due to the coding inefficiency. The results drawn from this paper can be provided not only a lesson to the design of large-scale heterogeneous P2P IPTV systems but also as a strong evidence that a poor choice of codec at the higher level might over shadow the network-level designs and the codec and network formation components ought to be co-designed for optimal user experience.","PeriodicalId":180192,"journal":{"name":"2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134155735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/ESTIMedia.2011.6088533
Liang-Gee Chen
Today's user demand for multimedia has moved into an anywhere anytime paradigm. The ubiquitous usage model creates lots of needs for embedded multimedia system design, and traditional module-wise design concept will not suffice. In this talk, the system design view of modern VLSI architectures for multimedia applications, including H.264, scalable video coding (SVC) and stereo/3D video coding, will be reviewed. In addition, several emerging applications, where machine-to-machine and machine-to-human design factors also become important, like distributed video coding (DVC), free-viewpoint TV and intelligent image recognition, will also be introduced. With the growth of these embedded architecture researches, we can expect a fruitful future of multimedia ICs and systems.
{"title":"System perspective on embedded multimedia","authors":"Liang-Gee Chen","doi":"10.1109/ESTIMedia.2011.6088533","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2011.6088533","url":null,"abstract":"Today's user demand for multimedia has moved into an anywhere anytime paradigm. The ubiquitous usage model creates lots of needs for embedded multimedia system design, and traditional module-wise design concept will not suffice. In this talk, the system design view of modern VLSI architectures for multimedia applications, including H.264, scalable video coding (SVC) and stereo/3D video coding, will be reviewed. In addition, several emerging applications, where machine-to-machine and machine-to-human design factors also become important, like distributed video coding (DVC), free-viewpoint TV and intelligent image recognition, will also be introduced. With the growth of these embedded architecture researches, we can expect a fruitful future of multimedia ICs and systems.","PeriodicalId":180192,"journal":{"name":"2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117016434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/ESTIMedia.2011.6088515
Namseung Lee, Sung-Soo Lim
As the products based on Android platform have been widely spread in consumer electronics market, the needs for systematic performance analysis have significantly increased. Conventional approaches rely on publicly open performance analysis tools in Android SDK or Linux community such as DDMS (Dalvik Debug Monitor Server), LTTng, Oprofile, and Ftrace. Though the approaches provide analysis or measurement results in certain aspects and specific software layers, any methods do not give a whole software layer view in performance analysis. For example, once a method in an Android application turned out to be a performance bottleneck, it is very hard to locate the code fragments that actually caused the bottleneck in the whole software layers: the application codes do not provide direct reason for the bottleneck, but the underlying native layers including kernel events often cause the bottleneck.
{"title":"A whole layer performance analysis method for Android platforms","authors":"Namseung Lee, Sung-Soo Lim","doi":"10.1109/ESTIMedia.2011.6088515","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2011.6088515","url":null,"abstract":"As the products based on Android platform have been widely spread in consumer electronics market, the needs for systematic performance analysis have significantly increased. Conventional approaches rely on publicly open performance analysis tools in Android SDK or Linux community such as DDMS (Dalvik Debug Monitor Server), LTTng, Oprofile, and Ftrace. Though the approaches provide analysis or measurement results in certain aspects and specific software layers, any methods do not give a whole software layer view in performance analysis. For example, once a method in an Android application turned out to be a performance bottleneck, it is very hard to locate the code fragments that actually caused the bottleneck in the whole software layers: the application codes do not provide direct reason for the bottleneck, but the underlying native layers including kernel events often cause the bottleneck.","PeriodicalId":180192,"journal":{"name":"2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117080792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/ESTIMedia.2011.6088523
Hua-Wei Fang, Mi-Yen Yeh, Pei-Lun Suei, Tei-Wei Kuo
This work is motivated by the strong demands of flash-friendly index designs to resolve reliability and performance concerns for data manipulations over flash memory. Different from the past work, we propose and explore the impacts of hot-data access and sibling-link updates to a tree index structure over flash memory. In particular, a flash-friendly B+-tree, referred to as a Durable B+-tree, is proposed to not only improve the endurance but also the performance of a tree index structure over flash memory. The capability of the proposed methodology and index design was evaluated by a series of experiments, in which significant improvement on endurance was achieved, compared with the past work.
{"title":"A flash-friendly B+-tree with endurance-awareness","authors":"Hua-Wei Fang, Mi-Yen Yeh, Pei-Lun Suei, Tei-Wei Kuo","doi":"10.1109/ESTIMedia.2011.6088523","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2011.6088523","url":null,"abstract":"This work is motivated by the strong demands of flash-friendly index designs to resolve reliability and performance concerns for data manipulations over flash memory. Different from the past work, we propose and explore the impacts of hot-data access and sibling-link updates to a tree index structure over flash memory. In particular, a flash-friendly B+-tree, referred to as a Durable B+-tree, is proposed to not only improve the endurance but also the performance of a tree index structure over flash memory. The capability of the proposed methodology and index design was evaluated by a series of experiments, in which significant improvement on endurance was achieved, compared with the past work.","PeriodicalId":180192,"journal":{"name":"2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia","volume":"81 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126063973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/ESTIMedia.2011.6088517
Kelvin K. Yue, Soumia Ghalim, Zheng Li, Frank Lockom, Shangping Ren, Lei Zhang, Xiaowei Li
Computation-intensive multimedia applications are emerging on mobile devices. System-on-Chip (SoC) offers high performance at a decreased size for these devices. SoC often integrates tens of cores and uses Network-on-Chip (NoC) as its communication infrastructure. To ensure high yield of manycore processors, core-level redundancy is often used as an effective approach to improve the reliability of manycore chips. However, when defective cores are replaced by redundant ones, the NoC topology changes. As a result, a fine-tuned application based on timing parameters given by one topology may not meet the expected timing behavior under the new one. To address this issue, we first define a metric that can measure the timing resemblance between different NoC topologies. Based on this metric, we develop a greedy algorithm to reconfigure a defect-tolerant manycore platform and form a unified application specific virtual topology on which the timing variations caused by the reconfiguration are minimized. Our simulation results clearly indicate the effectiveness of the developed algorithm.
{"title":"A greedy approach to tolerate defect cores for multimedia applications","authors":"Kelvin K. Yue, Soumia Ghalim, Zheng Li, Frank Lockom, Shangping Ren, Lei Zhang, Xiaowei Li","doi":"10.1109/ESTIMedia.2011.6088517","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2011.6088517","url":null,"abstract":"Computation-intensive multimedia applications are emerging on mobile devices. System-on-Chip (SoC) offers high performance at a decreased size for these devices. SoC often integrates tens of cores and uses Network-on-Chip (NoC) as its communication infrastructure. To ensure high yield of manycore processors, core-level redundancy is often used as an effective approach to improve the reliability of manycore chips. However, when defective cores are replaced by redundant ones, the NoC topology changes. As a result, a fine-tuned application based on timing parameters given by one topology may not meet the expected timing behavior under the new one. To address this issue, we first define a metric that can measure the timing resemblance between different NoC topologies. Based on this metric, we develop a greedy algorithm to reconfigure a defect-tolerant manycore platform and form a unified application specific virtual topology on which the timing variations caused by the reconfiguration are minimized. Our simulation results clearly indicate the effectiveness of the developed algorithm.","PeriodicalId":180192,"journal":{"name":"2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131015868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/ESTIMedia.2011.6088529
Jinwoo Kim, Tae-ho Shin, S. Ha, Hyunok Oh
In this paper, we focus on the throughput-constrained parallel execution of synchronous data flow graphs. This paper assumes static mapping and dynamic scheduling of nodes in contrast to the related work that assumes static scheduling. Since the scheduling order in dynamic scheduling is dependent on the priority assignment, three priority assignment methods are proposed and compared. If all task execution times do not vary at run-time, priority assignment is another way of storing a static schedule. We propose a static mapping technique to minimize the resource overhead considering both the processor cost and the total buffer size on all arcs under a given throughput constraint. Since the problem is NP-complete, a multi objective evolutionary algorithm is exploited to discover the mapping that minimizes the processor cost and the buffer requirement simultaneously. The experimental results show that the proposed technique requires fewer resources or higher average throughput than the previous approaches.
{"title":"Resource minimized static mapping and dynamic scheduling of SDF graphs","authors":"Jinwoo Kim, Tae-ho Shin, S. Ha, Hyunok Oh","doi":"10.1109/ESTIMedia.2011.6088529","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2011.6088529","url":null,"abstract":"In this paper, we focus on the throughput-constrained parallel execution of synchronous data flow graphs. This paper assumes static mapping and dynamic scheduling of nodes in contrast to the related work that assumes static scheduling. Since the scheduling order in dynamic scheduling is dependent on the priority assignment, three priority assignment methods are proposed and compared. If all task execution times do not vary at run-time, priority assignment is another way of storing a static schedule. We propose a static mapping technique to minimize the resource overhead considering both the processor cost and the total buffer size on all arcs under a given throughput constraint. Since the problem is NP-complete, a multi objective evolutionary algorithm is exploited to discover the mapping that minimizes the processor cost and the buffer requirement simultaneously. The experimental results show that the proposed technique requires fewer resources or higher average throughput than the previous approaches.","PeriodicalId":180192,"journal":{"name":"2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128723812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}