Pub Date : 2009-11-17DOI: 10.1109/ESTMED.2009.5336829
Mohammed G. Khatib, H. W. V. Dijk
An exciting class of storage devices is emerging: the class of Micro-Electro-Mechanical storage Systems (MEMS). Properties of MEMS-based storage devices include high density, small form factor, and low power. The use of this type of devices in mobile infotainment systems, such as video cameras is not at all obvious. We must explore their configuration and assess their benefit with respect to existing devices, such as Flash.
{"title":"Fast configuration of MEMS-based storage devices for streaming applications","authors":"Mohammed G. Khatib, H. W. V. Dijk","doi":"10.1109/ESTMED.2009.5336829","DOIUrl":"https://doi.org/10.1109/ESTMED.2009.5336829","url":null,"abstract":"An exciting class of storage devices is emerging: the class of Micro-Electro-Mechanical storage Systems (MEMS). Properties of MEMS-based storage devices include high density, small form factor, and low power. The use of this type of devices in mobile infotainment systems, such as video cameras is not at all obvious. We must explore their configuration and assess their benefit with respect to existing devices, such as Flash.","PeriodicalId":104499,"journal":{"name":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","volume":"AES-6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126502280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-11-17DOI: 10.1109/ESTMED.2009.5336821
Yang Yang, M. Geilen, T. Basten, S. Stuijk, H. Corporaal
Synchronous dataflow graphs (SDFGs) are widely used to model streaming applications such as signal processing and multimedia applications. These are often implemented on resource-constrained embedded platforms ranging from PDAs and cell phones to automobile equipment and printing systems. Trade-off analysis between resource usage and performance is critical in the life cycle of those products, from tailoring platforms to target applications at design time to resource management at runtime. We present a trade-off analysis method for SDFGs based on model-checking techniques and leveraging knowledge from the dataflow domain. We develop results to prune the state space of an SDFG for multi-objective model checking without loosing optimality. To achieve scalability to large state spaces, we combine these pruning techniques with pragmatic heuristics. We evaluate our techniques with two sets of experiments. One set shows we can now do throughput-storage trade-off analysis for shared memory architectures, showing reductions in memory usage of 10–50% compared to existing distributed memory based analysis. A second set of experiments shows how our techniques support design-space exploration for the digital datapath of a professional printer system. Analysis times range from less than a second to at most several minutes.
{"title":"Exploring trade-offs between performance and resource requirements for synchronous dataflow graphs","authors":"Yang Yang, M. Geilen, T. Basten, S. Stuijk, H. Corporaal","doi":"10.1109/ESTMED.2009.5336821","DOIUrl":"https://doi.org/10.1109/ESTMED.2009.5336821","url":null,"abstract":"Synchronous dataflow graphs (SDFGs) are widely used to model streaming applications such as signal processing and multimedia applications. These are often implemented on resource-constrained embedded platforms ranging from PDAs and cell phones to automobile equipment and printing systems. Trade-off analysis between resource usage and performance is critical in the life cycle of those products, from tailoring platforms to target applications at design time to resource management at runtime. We present a trade-off analysis method for SDFGs based on model-checking techniques and leveraging knowledge from the dataflow domain. We develop results to prune the state space of an SDFG for multi-objective model checking without loosing optimality. To achieve scalability to large state spaces, we combine these pruning techniques with pragmatic heuristics. We evaluate our techniques with two sets of experiments. One set shows we can now do throughput-storage trade-off analysis for shared memory architectures, showing reductions in memory usage of 10–50% compared to existing distributed memory based analysis. A second set of experiments shows how our techniques support design-space exploration for the digital datapath of a professional printer system. Analysis times range from less than a second to at most several minutes.","PeriodicalId":104499,"journal":{"name":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114298050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-11-17DOI: 10.1109/ESTMED.2009.5336815
L. A. Bathen, Yongjin Ahn, N. Dutt, S. Pasricha
The increasing demand for low power and high performance multimedia embedded systems has motivatedation bandwidth and latency requirements under a tight power budge the need for effective solutions to satisfy applict. As technology scales, it is imperative that applications are optimized to take full advantage of the underlying resources and meet both power and performance requirements. We propose a methodology capable of discovering and enabling parallelism opportunities via code transformations, efficiently distributing the computational load across resources, and minimizing unnecessary data transfers. Our approach decomposes the application's tasks into smaller units of computations called kernels, which are distributed and pipelined across the different processing resources. We exploit the ideas of inter-kernel data reuse to minimize unnecessary data transfers between kernels and early execution edges to drive performance. Our experimental results on a JPEG2000 case study show up to 80% performance improvement and 60% dynamic power reduction over standard application mapping approaches.
{"title":"Inter-kernel data reuse and pipelining on chip-multiprocessors for multimedia applications","authors":"L. A. Bathen, Yongjin Ahn, N. Dutt, S. Pasricha","doi":"10.1109/ESTMED.2009.5336815","DOIUrl":"https://doi.org/10.1109/ESTMED.2009.5336815","url":null,"abstract":"The increasing demand for low power and high performance multimedia embedded systems has motivatedation bandwidth and latency requirements under a tight power budge the need for effective solutions to satisfy applict. As technology scales, it is imperative that applications are optimized to take full advantage of the underlying resources and meet both power and performance requirements. We propose a methodology capable of discovering and enabling parallelism opportunities via code transformations, efficiently distributing the computational load across resources, and minimizing unnecessary data transfers. Our approach decomposes the application's tasks into smaller units of computations called kernels, which are distributed and pipelined across the different processing resources. We exploit the ideas of inter-kernel data reuse to minimize unnecessary data transfers between kernels and early execution edges to drive performance. Our experimental results on a JPEG2000 case study show up to 80% performance improvement and 60% dynamic power reduction over standard application mapping approaches.","PeriodicalId":104499,"journal":{"name":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129658583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Over the last one decade there has been an increasing emphasis on driver-assistance systems for the automotive domain. In this paper we report our work on designing a camera-based surveillance system embedded in a “smart” car door. Such a camera is used to monitor the ambient environment outside the car — e.g., the presence of obstacles such as approaching cars or cyclists who might collide with the car door if opened — and automatically control the car door operations. This is an enhancement to the currently available side-view mirrors which the driver/passenger checks before opening the car door. The focus of this paper is on fast and robust image processing algorithms specifically targeting such a smart car door system. The requirement is to quickly detect traffic objects of interest from gray-scale images captured by omnidirectional cameras. Whereas known algorithms for object extraction from the image processing literature rely on color information and are sensitive to shadows and illumination changes, our proposed algorithms are highly robust, can operate on gray-scale images (color images are not available in our setup) and output results in real-time. To illustrate these, we present a number of experimental results based on image sequences captured from real-life traffic scenarios.
{"title":"Robust image processing for an omnidirectional camera-based smart car door","authors":"C. Scharfenberger, S. Chakraborty, G. Färber","doi":"10.1145/2362336.2362354","DOIUrl":"https://doi.org/10.1145/2362336.2362354","url":null,"abstract":"Over the last one decade there has been an increasing emphasis on driver-assistance systems for the automotive domain. In this paper we report our work on designing a camera-based surveillance system embedded in a “smart” car door. Such a camera is used to monitor the ambient environment outside the car — e.g., the presence of obstacles such as approaching cars or cyclists who might collide with the car door if opened — and automatically control the car door operations. This is an enhancement to the currently available side-view mirrors which the driver/passenger checks before opening the car door. The focus of this paper is on fast and robust image processing algorithms specifically targeting such a smart car door system. The requirement is to quickly detect traffic objects of interest from gray-scale images captured by omnidirectional cameras. Whereas known algorithms for object extraction from the image processing literature rely on color information and are sensitive to shadows and illumination changes, our proposed algorithms are highly robust, can operate on gray-scale images (color images are not available in our setup) and output results in real-time. To illustrate these, we present a number of experimental results based on image sequences captured from real-life traffic scenarios.","PeriodicalId":104499,"journal":{"name":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126013386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-11-17DOI: 10.1109/ESTMED.2009.5336819
L. Gauthier, T. Ishihara
Memory accesses are a major cause of energy consumption for embedded systems and the stack is a frequent target for data accesses. This paper presents a fully software technique which aims at reducing the energy consumption related to the stack by allocating and transferring frames or part of frames between a scratch-pad memory and the main memory. The technique utilizes an integer linear formulation of the problem in order to find at compile time the optimal management for the frames. The technique is also extended to integrate existing methods which deal with static memory objects and others which deal with recursive functions. Experimental results show that our technique effectively exploits an available scratch-pad memory space which is only one half of what the stack requires to reduce the stack-related energy consumption by more than 90% for several applications and on an average of 84% compared to the case where all the frames of the stack are placed into the main memory.
{"title":"Optimal stack frame placement and transfer for energy reduction targeting embedded processors with scratch-pad memories","authors":"L. Gauthier, T. Ishihara","doi":"10.1109/ESTMED.2009.5336819","DOIUrl":"https://doi.org/10.1109/ESTMED.2009.5336819","url":null,"abstract":"Memory accesses are a major cause of energy consumption for embedded systems and the stack is a frequent target for data accesses. This paper presents a fully software technique which aims at reducing the energy consumption related to the stack by allocating and transferring frames or part of frames between a scratch-pad memory and the main memory. The technique utilizes an integer linear formulation of the problem in order to find at compile time the optimal management for the frames. The technique is also extended to integrate existing methods which deal with static memory objects and others which deal with recursive functions. Experimental results show that our technique effectively exploits an available scratch-pad memory space which is only one half of what the stack requires to reduce the stack-related energy consumption by more than 90% for several applications and on an average of 84% compared to the case where all the frames of the stack are placed into the main memory.","PeriodicalId":104499,"journal":{"name":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131391276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-11-17DOI: 10.1109/ESTMED.2009.5336828
Wolfgang Haid, Lars Schor, Kai Huang, Iuliana Bacivarov, L. Thiele
As single-processor systems are ceasing to scale effectively, multi-processor systems are becoming more and more popular. While there are many challenges of designing multi-processor systems in hardware, writing efficient parallel applications that utilize the computing capability of multiple processors may reveal to be even more challenging. In this paper, we introduce a framework that allows to efficiently execute applications expressed as Kahn process networks on multi-processor systems using protothreads and windowed FIFOs. We show that application developers can use this framework to achieve considerable speed-ups on the Cell Broadband Engine without needing to write architecture-specific code.
{"title":"Efficient execution of Kahn process networks on multi-processor systems using protothreads and windowed FIFOs","authors":"Wolfgang Haid, Lars Schor, Kai Huang, Iuliana Bacivarov, L. Thiele","doi":"10.1109/ESTMED.2009.5336828","DOIUrl":"https://doi.org/10.1109/ESTMED.2009.5336828","url":null,"abstract":"As single-processor systems are ceasing to scale effectively, multi-processor systems are becoming more and more popular. While there are many challenges of designing multi-processor systems in hardware, writing efficient parallel applications that utilize the computing capability of multiple processors may reveal to be even more challenging. In this paper, we introduce a framework that allows to efficiently execute applications expressed as Kahn process networks on multi-processor systems using protothreads and windowed FIFOs. We show that application developers can use this framework to achieve considerable speed-ups on the Cell Broadband Engine without needing to write architecture-specific code.","PeriodicalId":104499,"journal":{"name":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130533368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-11-17DOI: 10.1109/ESTMED.2009.5336825
T. Cucinotta, Luca Abeni, L. Palopoli, Fabio Checconi
Multimedia applications are often characterised by implicit temporal constraints but, in many cases, they are not programmed using any specialised real-time API. These “Legacy applications” have no way to communicate their temporal constraints to the OS kernel, and their quality of service (QoS), being necessarily linked to the temporal behaviour, fails to satisfy acceptable standards. In this paper we propose an innovative way for dealing with these applications, based on the combination of an on-line identification mechanism (which extracts from high-level observations such important parameters as the execution rate) and an adaptive scheduler (specialised for legacy applications) that identifies the correct amount of CPU needed by each application. Preliminary experimental results are reported, proving the effectiveness of the proposed idea in providing a widely used multimedia player on Linux with appropriate QoS guarantees, through an appropriate choice of the scheduling parameters. Finally, a detailed road-map is presented with the possible extensions to the approach.
{"title":"The wizard of OS: a heartbeat for Legacy multimedia applications","authors":"T. Cucinotta, Luca Abeni, L. Palopoli, Fabio Checconi","doi":"10.1109/ESTMED.2009.5336825","DOIUrl":"https://doi.org/10.1109/ESTMED.2009.5336825","url":null,"abstract":"Multimedia applications are often characterised by implicit temporal constraints but, in many cases, they are not programmed using any specialised real-time API. These “Legacy applications” have no way to communicate their temporal constraints to the OS kernel, and their quality of service (QoS), being necessarily linked to the temporal behaviour, fails to satisfy acceptable standards. In this paper we propose an innovative way for dealing with these applications, based on the combination of an on-line identification mechanism (which extracts from high-level observations such important parameters as the execution rate) and an adaptive scheduler (specialised for legacy applications) that identifies the correct amount of CPU needed by each application. Preliminary experimental results are reported, proving the effectiveness of the proposed idea in providing a widely used multimedia player on Linux with appropriate QoS guarantees, through an appropriate choice of the scheduling parameters. Finally, a detailed road-map is presented with the possible extensions to the approach.","PeriodicalId":104499,"journal":{"name":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122461428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-11-17DOI: 10.1109/ESTMED.2009.5336822
Ju Ren, Yi He, Wei Wu, M. Wen, N. Wu, Chunyuan Zhang
Real-time encoding of high-definition H.264 video is a challenge to current embedded programmable processors. Emerging stream processing methods supported by most GPUs and programmable processors provide a powerful mechanism to achieve surprising high performance in media/signal processing, which bring an opportunity to deal with this challenge. However, traditional serial CAVLC has highly input-dependent execution and precedence constraints, which becomes a bottleneck to implement H.264 encoder efficiently. This paper presents a software parallel CAVLC encoder based on stream processing. Many approaches are explored to solve the restrictions of parallelizing CAVLC caused by data dependency and branch/loop instructions. Experiment results show that our parallel CAVLC encoder on two stream processing platforms of STORM and GPU achieves 3.03x and 2.08x speedup over the original serial CAVLC respectively. Finally, the proposed parallel CAVLC encoder coupled with stream processor enables a real-time encoding of 1080p H.264 video.
{"title":"Software parallel CAVLC encoder based on stream processing","authors":"Ju Ren, Yi He, Wei Wu, M. Wen, N. Wu, Chunyuan Zhang","doi":"10.1109/ESTMED.2009.5336822","DOIUrl":"https://doi.org/10.1109/ESTMED.2009.5336822","url":null,"abstract":"Real-time encoding of high-definition H.264 video is a challenge to current embedded programmable processors. Emerging stream processing methods supported by most GPUs and programmable processors provide a powerful mechanism to achieve surprising high performance in media/signal processing, which bring an opportunity to deal with this challenge. However, traditional serial CAVLC has highly input-dependent execution and precedence constraints, which becomes a bottleneck to implement H.264 encoder efficiently. This paper presents a software parallel CAVLC encoder based on stream processing. Many approaches are explored to solve the restrictions of parallelizing CAVLC caused by data dependency and branch/loop instructions. Experiment results show that our parallel CAVLC encoder on two stream processing platforms of STORM and GPU achieves 3.03x and 2.08x speedup over the original serial CAVLC respectively. Finally, the proposed parallel CAVLC encoder coupled with stream processor enables a real-time encoding of 1080p H.264 video.","PeriodicalId":104499,"journal":{"name":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132029268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-11-17DOI: 10.1109/ESTMED.2009.5336820
Hui-Ting Yang, Jian-Wen Chen, Huang-Chih Kuo, Y. Lin
For all video applications, large amounts of data are processed within a bounded time. These data are usually stored in a low-cost slow external DRAM which results in high memory bandwidth requirement. The memory bandwidth will dominate the system performance, especially for applications running on embedded systems. In this paper, we propose an effective dictionary-based compression and de-compression algorithm for display frames in a video decoding system and present its hardware implementation. We have integrated the proposed design into an H.264/AVC video decoder. Simulation result shows that the proposed algorithm achieves 54% of compression ratio and 34% of memory traffic reduction when decoding 1080HD video. It is much more effective than all previous works.
{"title":"An effective dictionary-based display frame compressor","authors":"Hui-Ting Yang, Jian-Wen Chen, Huang-Chih Kuo, Y. Lin","doi":"10.1109/ESTMED.2009.5336820","DOIUrl":"https://doi.org/10.1109/ESTMED.2009.5336820","url":null,"abstract":"For all video applications, large amounts of data are processed within a bounded time. These data are usually stored in a low-cost slow external DRAM which results in high memory bandwidth requirement. The memory bandwidth will dominate the system performance, especially for applications running on embedded systems. In this paper, we propose an effective dictionary-based compression and de-compression algorithm for display frames in a video decoding system and present its hardware implementation. We have integrated the proposed design into an H.264/AVC video decoder. Simulation result shows that the proposed algorithm achieves 54% of compression ratio and 34% of memory traffic reduction when decoding 1080HD video. It is much more effective than all previous works.","PeriodicalId":104499,"journal":{"name":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128282189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-11-17DOI: 10.1109/ESTMED.2009.5336818
Koichi Hattori, Hiroshi Tsutsui, H. Ochi, Yukihiro Nakamura
JPEG XR is an emerging image coding standard, based on HD Photo developed by Microsoft. It supports high compression performance twice as high as the de facto image coding system, namely JPEG, and also has an advantage over JPEG 2000 in terms of computational cost. JPEG XR is expected to be widespread for many devices including embedded systems in the near future. In this paper, we propose a novel architecture for JPEG XR encoding. In previous architectures, entropy coding was the throughput bottleneck because it was implemented as a sequential algorithm to handle data with dependency. We found that there is no dependency in intra-macroblock data, and we could safely pipeline all the encoding processes including the entropy coding. The proposed fully-pipelined architecture achieves 100 M pixel/sec at 125 MHz which could not be achieved by previous works.
{"title":"A high-throughput pipelined architecture for JPEG XR encoding","authors":"Koichi Hattori, Hiroshi Tsutsui, H. Ochi, Yukihiro Nakamura","doi":"10.1109/ESTMED.2009.5336818","DOIUrl":"https://doi.org/10.1109/ESTMED.2009.5336818","url":null,"abstract":"JPEG XR is an emerging image coding standard, based on HD Photo developed by Microsoft. It supports high compression performance twice as high as the de facto image coding system, namely JPEG, and also has an advantage over JPEG 2000 in terms of computational cost. JPEG XR is expected to be widespread for many devices including embedded systems in the near future. In this paper, we propose a novel architecture for JPEG XR encoding. In previous architectures, entropy coding was the throughput bottleneck because it was implemented as a sequential algorithm to handle data with dependency. We found that there is no dependency in intra-macroblock data, and we could safely pipeline all the encoding processes including the entropy coding. The proposed fully-pipelined architecture achieves 100 M pixel/sec at 125 MHz which could not be achieved by previous works.","PeriodicalId":104499,"journal":{"name":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","volume":"53 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113938924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}