S. Schliecker, Mircea Negrean, G. Nicolescu, P. Paulin, R. Ernst
Formal performance analysis is now regularly applied in the design of distributed embedded systems such as automotive electronics, where it greatly contributes to an improved predictability and platform robustness of complex networked systems. Even though it might be highly beneficial also in MpSoC design, formal performance analysis could not easily be applied so far, because the classical task communication model does not cover processor-memory traffic, which is an integral part of MpSoC timing. Introducing memory accesses as individual transactions under the classical model has shown to be inefficient, and previous approaches work well only under strict orthogonalization of different traffic streams. Recent research has presented extensions of the classical task model and a corresponding analysis that covers performance implications of shared memory traffic. In this paper we present a multithreaded multiprocessors platform and multimedia application. We conduct performance analysis using the new analysis options and specifically benchmark the quality of the available approach. Our experiments show that corner case coverage can now be supplied with a very high accuracy, allowing to quickly investigate architectural alternatives.
{"title":"Reliable performance analysis of a multicore multithreaded system-on-chip","authors":"S. Schliecker, Mircea Negrean, G. Nicolescu, P. Paulin, R. Ernst","doi":"10.1145/1450135.1450172","DOIUrl":"https://doi.org/10.1145/1450135.1450172","url":null,"abstract":"Formal performance analysis is now regularly applied in the design of distributed embedded systems such as automotive electronics, where it greatly contributes to an improved predictability and platform robustness of complex networked systems. Even though it might be highly beneficial also in MpSoC design, formal performance analysis could not easily be applied so far, because the classical task communication model does not cover processor-memory traffic, which is an integral part of MpSoC timing. Introducing memory accesses as individual transactions under the classical model has shown to be inefficient, and previous approaches work well only under strict orthogonalization of different traffic streams.\u0000 Recent research has presented extensions of the classical task model and a corresponding analysis that covers performance implications of shared memory traffic. In this paper we present a multithreaded multiprocessors platform and multimedia application. We conduct performance analysis using the new analysis options and specifically benchmark the quality of the available approach. Our experiments show that corner case coverage can now be supplied with a very high accuracy, allowing to quickly investigate architectural alternatives.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124074706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Modern embedded compute platforms increasingly contain both microprocessors and field-programmable gate arrays (FPGAs). The FPGAs may implement accelerators or other circuits to speedup performance. Many such circuits have been previously designed for acceleration via application-specific integrated circuits (ASICs). Redesigning an ASIC circuit for FPGA implementation involves several challenges. We describe a case study that highlights a common challenge related to memories. The study involves converting a pattern counting circuit architecture, based on a pipelined binary tree and originally designed for ASIC implementation, into a circuit suitable for FPGAs. The original ASIC-oriented circuit, when mapped to a Spartan 3e FPGA, could process 10 million patterns per second and handle up to 4,096 patterns. The redesigned circuit could instead process 100 million patterns per second and handle up to 32,768 patterns, representing a 10x performance improvement and a 4x utilization improvement. The redesign involved partitioning large memories into smaller ones at the expense of redundant control logic. Through this and other case studies, design patterns may emerge that aid designers in redesigning ASIC circuits for FPGAs as well as in building new high-performance and efficient circuits for FPGAs.
现代嵌入式计算平台越来越多地同时包含微处理器和现场可编程门阵列(fpga)。fpga可以实现加速器或其他电路来加速性能。许多这样的电路以前都是通过专用集成电路(asic)来加速设计的。重新设计用于FPGA实现的ASIC电路涉及几个挑战。我们描述了一个案例研究,突出了与记忆相关的常见挑战。该研究涉及将基于流水线二叉树的模式计数电路架构(最初设计用于ASIC实现)转换为适合fpga的电路。原始的面向asic的电路,当映射到Spartan 3e FPGA时,每秒可以处理1000万个模式,最多可以处理4096个模式。重新设计的电路每秒可以处理1亿个模式,最多可处理32,768个模式,性能提高了10倍,利用率提高了4倍。重新设计涉及到以冗余控制逻辑为代价将大内存划分为小内存。通过这个和其他案例研究,设计模式可能会出现,帮助设计人员重新设计fpga的ASIC电路,以及为fpga构建新的高性能和高效电路。
{"title":"Don't forget memories: a case study redesigning a pattern counting ASIC circuit for FPGAs","authors":"David Sheldon, F. Vahid","doi":"10.1145/1450135.1450171","DOIUrl":"https://doi.org/10.1145/1450135.1450171","url":null,"abstract":"Modern embedded compute platforms increasingly contain both microprocessors and field-programmable gate arrays (FPGAs). The FPGAs may implement accelerators or other circuits to speedup performance. Many such circuits have been previously designed for acceleration via application-specific integrated circuits (ASICs). Redesigning an ASIC circuit for FPGA implementation involves several challenges. We describe a case study that highlights a common challenge related to memories. The study involves converting a pattern counting circuit architecture, based on a pipelined binary tree and originally designed for ASIC implementation, into a circuit suitable for FPGAs. The original ASIC-oriented circuit, when mapped to a Spartan 3e FPGA, could process 10 million patterns per second and handle up to 4,096 patterns. The redesigned circuit could instead process 100 million patterns per second and handle up to 32,768 patterns, representing a 10x performance improvement and a 4x utilization improvement. The redesign involved partitioning large memories into smaller ones at the expense of redundant control logic. Through this and other case studies, design patterns may emerge that aid designers in redesigning ASIC circuits for FPGAs as well as in building new high-performance and efficient circuits for FPGAs.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134471601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
While hardware/software partitioning has been shown to provide significant performance gains, most hardware/software partitioning approaches are limited to partitioning computational kernels utilizing integers or fixed point implementations. Software developers often initially develop an application using built-in floating point representations and later convert the application to a fixed point representation - a potentially time consuming process. In this paper, we present a hardware/software partitioning approach for floating point applications that eliminates the need for developers to rewrite software applications for fixed point implementations. Instead, the proposed approach incorporates efficient, configurable floating point to fixed point and fixed point to floating point hardware converters at the boundary between the hardware coprocessors and memory. This effectively separates the system into a floating point domain consisting of the microprocessor and memory subsystem and a fixed point domain consisting of the partitioned hardware coprocessors, thereby providing an efficient and rapid method for implementing fixed point hardware coprocessors.
{"title":"Hardware/software partitioning of floating point software applications to fixed-pointed coprocessor circuits","authors":"L. Saldanha, Roman L. Lysecky","doi":"10.1145/1450135.1450148","DOIUrl":"https://doi.org/10.1145/1450135.1450148","url":null,"abstract":"While hardware/software partitioning has been shown to provide significant performance gains, most hardware/software partitioning approaches are limited to partitioning computational kernels utilizing integers or fixed point implementations. Software developers often initially develop an application using built-in floating point representations and later convert the application to a fixed point representation - a potentially time consuming process. In this paper, we present a hardware/software partitioning approach for floating point applications that eliminates the need for developers to rewrite software applications for fixed point implementations. Instead, the proposed approach incorporates efficient, configurable floating point to fixed point and fixed point to floating point hardware converters at the boundary between the hardware coprocessors and memory. This effectively separates the system into a floating point domain consisting of the microprocessor and memory subsystem and a fixed point domain consisting of the partitioned hardware coprocessors, thereby providing an efficient and rapid method for implementing fixed point hardware coprocessors.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"428 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132690059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Instruction set extensions (ISEs) can accelerate embedded processor performance. Many algorithms for ISE generation have shown good potential; some of them have recently been expanded to include Architecturally Visible Storage (AVS) - compiler-controlled memories, similar to scratchpads, that are accessible only to ISEs. To achieve a speedup using AVS, Direct Memory Access (DMA) transfers are required to move data from the main memory to the AVS; unfortunately, this creates coherence problems between the AVS and the cache, which previous methods for ISEs with AVS failed to address; additionally, these methods need to leave many conservative DMA transfers in place, whose execution significantly limits the achievable speedup. This paper presents a memory coherence scheme for ISEs with AVS, which can ensure execution correctness and memory consistency with minimal area overhead. We also present a method that speculatively removes redundant DMA transfers. Cycle-accurate experimental results were obtained using an FPGA-emulation platform. These results show that the application-specific instruction-set extended processors with speculative DMA-enhanced AVS gain significantly over previous techniques, despite the overhead of the coherence mechanism.
{"title":"Speculative DMA for architecturally visible storage in instruction set extensions","authors":"Theo Kluter, P. Brisk, P. Ienne, E. Charbon","doi":"10.1145/1450135.1450191","DOIUrl":"https://doi.org/10.1145/1450135.1450191","url":null,"abstract":"Instruction set extensions (ISEs) can accelerate embedded processor performance. Many algorithms for ISE generation have shown good potential; some of them have recently been expanded to include Architecturally Visible Storage (AVS) - compiler-controlled memories, similar to scratchpads, that are accessible only to ISEs. To achieve a speedup using AVS, Direct Memory Access (DMA) transfers are required to move data from the main memory to the AVS; unfortunately, this creates coherence problems between the AVS and the cache, which previous methods for ISEs with AVS failed to address; additionally, these methods need to leave many conservative DMA transfers in place, whose execution significantly limits the achievable speedup. This paper presents a memory coherence scheme for ISEs with AVS, which can ensure execution correctness and memory consistency with minimal area overhead. We also present a method that speculatively removes redundant DMA transfers. Cycle-accurate experimental results were obtained using an FPGA-emulation platform. These results show that the application-specific instruction-set extended processors with speculative DMA-enhanced AVS gain significantly over previous techniques, despite the overhead of the coherence mechanism.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130925637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Felix Reimann, M. Glaß, M. Lukasiewycz, J. Keinert, C. Haubelt, J. Teich
This paper presents a system synthesis approach for dependable embedded systems. The proposed approach significantly extends previous work by automatically inserting fault detection and fault toleration mechanisms into an implementation. The main contributions of this paper are 1) a dependability-aware system synthesis approach that automatically performs a redundant task binding and placement of voting structures to increase both, reliability and safety, respectively, 2) an efficient dependability analysis approach to evaluate lifetime reliability and safety, and 3) results from synthesizing a Motion-JPEG decoder for an FPGA platform using the proposed system synthesis approach. As a result, a set of high-quality solutions of the decoder with maximized reliability, safety, performance, and simultaneously minimized resource requirements is achieved.
{"title":"Symbolic voter placement for dependability-aware system synthesis","authors":"Felix Reimann, M. Glaß, M. Lukasiewycz, J. Keinert, C. Haubelt, J. Teich","doi":"10.1145/1450135.1450190","DOIUrl":"https://doi.org/10.1145/1450135.1450190","url":null,"abstract":"This paper presents a system synthesis approach for dependable embedded systems. The proposed approach significantly extends previous work by automatically inserting fault detection and fault toleration mechanisms into an implementation. The main contributions of this paper are 1) a dependability-aware system synthesis approach that automatically performs a redundant task binding and placement of voting structures to increase both, reliability and safety, respectively, 2) an efficient dependability analysis approach to evaluate lifetime reliability and safety, and 3) results from synthesizing a Motion-JPEG decoder for an FPGA platform using the proposed system synthesis approach. As a result, a set of high-quality solutions of the decoder with maximized reliability, safety, performance, and simultaneously minimized resource requirements is achieved.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114323667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Memory subsystems have been considered as one of the most critical components in embedded systems and furthermore, displaying increasing complexity as application requirements diversify. Modern embedded systems are generally equipped with multiple heterogeneous memory devices to satisfy diverse requirements and constraints. NAND flash memory has been widely adopted for data storage because of its outstanding benefits on cost, power, capacity and non-volatility. However, in NAND flash memory, the intrinsic costs for the read and write accesses are highly disproportionate in performance and access granularity. The consequent data management complexity and performance deterioration have precluded the adoption of NAND flash memory. In this paper, we introduce a highly effective non-volatile primary memory architecture which incorporates application specific information to develop a NAND flash based primary memory. The proposed architecture provides a unified non-volatile primary memory solution which relieves design complications caused by the growing complexity in memory subsystems. Our architecture aggressively minimizes the overhead and redundancy of the NAND based systems by exploiting efficient address space management and dynamic data migration based on accurate application behavioral analysis. We also propose a highly parallelized memory architecture through an active and dynamic data redistribution over the multiple flash memories based on run-time workload analysis. The experimental results show that our proposed architecture significantly enhances average memory access cycle time which is comparable to the standard DRAM access cycle time and also considerably prolongs the device life-cycle by autonomous wear-leveling and minimizing the program/erase operations.
{"title":"Application specific non-volatile primary memory for embedded systems","authors":"Kwangyoon Lee, A. Orailoglu","doi":"10.1145/1450135.1450144","DOIUrl":"https://doi.org/10.1145/1450135.1450144","url":null,"abstract":"Memory subsystems have been considered as one of the most critical components in embedded systems and furthermore, displaying increasing complexity as application requirements diversify. Modern embedded systems are generally equipped with multiple heterogeneous memory devices to satisfy diverse requirements and constraints. NAND flash memory has been widely adopted for data storage because of its outstanding benefits on cost, power, capacity and non-volatility. However, in NAND flash memory, the intrinsic costs for the read and write accesses are highly disproportionate in performance and access granularity. The consequent data management complexity and performance deterioration have precluded the adoption of NAND flash memory. In this paper, we introduce a highly effective non-volatile primary memory architecture which incorporates application specific information to develop a NAND flash based primary memory. The proposed architecture provides a unified non-volatile primary memory solution which relieves design complications caused by the growing complexity in memory subsystems. Our architecture aggressively minimizes the overhead and redundancy of the NAND based systems by exploiting efficient address space management and dynamic data migration based on accurate application behavioral analysis. We also propose a highly parallelized memory architecture through an active and dynamic data redistribution over the multiple flash memories based on run-time workload analysis. The experimental results show that our proposed architecture significantly enhances average memory access cycle time which is comparable to the standard DRAM access cycle time and also considerably prolongs the device life-cycle by autonomous wear-leveling and minimizing the program/erase operations.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120937480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The combination of flit-buffer flow control methods and latency-insensitive protocols is an effective solution for networks-on-chip (NoC). Since they both rely on backpressure, the two techniques are easy to combine while offering complementary advantages: low complexity of router design and the ability to cope with long communication channels via automatic wire pipelining. We study various alternative implementations of this idea by considering the combination of three different types of flit-buffer flow control methods and two different classes of channel repeaters (based respectively on flip-flops and relay stations). We characterize the area and performance of the two most promising alternative implementations for NoCs by completing the RTL design and logic synthesis of the repeaters and routers for different channel parallelisms. Finally, we derive high-level abstractions of our circuit designs and we use them to perform system-level simulations under various scenarios for two distinct NoC topologies and various applications. Based on our comparative analysis and experimental results, we propose a NoC design approach that combines the reduction of the router queues to a minimum size with the distribution of flit buffering onto the channels. This approach provides precious flexibility during the physical design phase for many NoCs, particularly in those systems-on-chip that must be designed to meet a tight constraint on the target clock frequency.
{"title":"Distributed flit-buffer flow control for networks-on-chip","authors":"Nicola Concer, M. Petracca, L. Carloni","doi":"10.1145/1450135.1450183","DOIUrl":"https://doi.org/10.1145/1450135.1450183","url":null,"abstract":"The combination of flit-buffer flow control methods and latency-insensitive protocols is an effective solution for networks-on-chip (NoC). Since they both rely on backpressure, the two techniques are easy to combine while offering complementary advantages: low complexity of router design and the ability to cope with long communication channels via automatic wire pipelining. We study various alternative implementations of this idea by considering the combination of three different types of flit-buffer flow control methods and two different classes of channel repeaters (based respectively on flip-flops and relay stations). We characterize the area and performance of the two most promising alternative implementations for NoCs by completing the RTL design and logic synthesis of the repeaters and routers for different channel parallelisms. Finally, we derive high-level abstractions of our circuit designs and we use them to perform system-level simulations under various scenarios for two distinct NoC topologies and various applications. Based on our comparative analysis and experimental results, we propose a NoC design approach that combines the reduction of the router queues to a minimum size with the distribution of flit buffering onto the channels. This approach provides precious flexibility during the physical design phase for many NoCs, particularly in those systems-on-chip that must be designed to meet a tight constraint on the target clock frequency.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130849833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Security is a growing concern in processor based systems and hence requires immediate attention. New paradigms in the design of MPSoCs must be found, with security as one of the primary objectives. Software attacks like Code Injection Attacks exploit vulnerabilities in "trusted" code. Previous countermeasures addressing code injection attacks in MPSoCs have significant performance overheads and do not check every single line of code. The work described in this paper has reduced performance overhead and ensures that all the lines in the program code are checked. We propose an MPSoC system where one processor (which we call a MONITOR processor) is responsible for supervising all other application processors. Our design flow, LOCS, instruments and profiles the execution of basic blocks in the program. LOCS subsequently uses the profiler output to re-instrument the source files to minimize runtime overheads. LOCS also aids in the design of hardware customizations required by the MONITOR. At runtime, the MONITOR checks the validity of the control flow transitions and the execution time of basic blocks. We implemented our system on a commercial extensible processor, Xtensa LX2, and tested it on three multimedia benchmarks. The experiments show that our system has the worst-case performance degradation of about 24% and an area overhead of approximately 40%. LOCS has smaller performance, area and code size overheads than all previous code injection countermeasures for MPSoCs.
{"title":"LOCS: a low overhead profiler-driven design flow for security of MPSoCs","authors":"K. Patel, S. Parameswaran","doi":"10.1145/1450135.1450154","DOIUrl":"https://doi.org/10.1145/1450135.1450154","url":null,"abstract":"Security is a growing concern in processor based systems and hence requires immediate attention. New paradigms in the design of MPSoCs must be found, with security as one of the primary objectives. Software attacks like Code Injection Attacks exploit vulnerabilities in \"trusted\" code. Previous countermeasures addressing code injection attacks in MPSoCs have significant performance overheads and do not check every single line of code. The work described in this paper has reduced performance overhead and ensures that all the lines in the program code are checked.\u0000 We propose an MPSoC system where one processor (which we call a MONITOR processor) is responsible for supervising all other application processors. Our design flow, LOCS, instruments and profiles the execution of basic blocks in the program. LOCS subsequently uses the profiler output to re-instrument the source files to minimize runtime overheads. LOCS also aids in the design of hardware customizations required by the MONITOR. At runtime, the MONITOR checks the validity of the control flow transitions and the execution time of basic blocks.\u0000 We implemented our system on a commercial extensible processor, Xtensa LX2, and tested it on three multimedia benchmarks. The experiments show that our system has the worst-case performance degradation of about 24% and an area overhead of approximately 40%. LOCS has smaller performance, area and code size overheads than all previous code injection countermeasures for MPSoCs.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114367136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The capacity of flash-memory storage systems grows at a speed similar to many other storage systems. In order to properly manage the product cost, vendors face serious challenges in system designs. How to provide an expected system initialization time for huge-capacity flash-memory storage systems has become an important research topic. In this paper, a time-predictable system initialization design is proposed for huge-capacity flash-memory storage systems. The objective of the design is to provide an expected system initialization time based on a coarse-grained flash translation layer. The time-predictable analysis of the design is provided to discuss the relation between the size of main memory and the system initialization time. The system initialization time can be also estimated and predicted by the time-predictable analysis.
{"title":"A time-predictable system initialization design for huge-capacity flash-memory storage systems","authors":"Chin-Hsien Wu","doi":"10.1145/1450135.1450140","DOIUrl":"https://doi.org/10.1145/1450135.1450140","url":null,"abstract":"The capacity of flash-memory storage systems grows at a speed similar to many other storage systems. In order to properly manage the product cost, vendors face serious challenges in system designs. How to provide an expected system initialization time for huge-capacity flash-memory storage systems has become an important research topic. In this paper, a time-predictable system initialization design is proposed for huge-capacity flash-memory storage systems. The objective of the design is to provide an expected system initialization time based on a coarse-grained flash translation layer. The time-predictable analysis of the design is provided to discuss the relation between the size of main memory and the system initialization time. The system initialization time can be also estimated and predicted by the time-predictable analysis.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"10 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131943882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We utilize application trends analysis, focused on webpage content, in order to examine the design of mobile computers more holistically. We find that both Internet bandwidth and processing local to the computing device is being wasted by re-transmission of formatting data. By taking this broader view, and separating Macromedia Flash content into raw data and its packaging, we show that performance can be increased by 84%, power consumption can be decreased by 71%, and communications bandwidth can be saved by an order of magnitude.
{"title":"Holistic design and caching in mobile computing","authors":"Mwaffaq Otoom, J. M. Paul","doi":"10.1145/1450135.1450161","DOIUrl":"https://doi.org/10.1145/1450135.1450161","url":null,"abstract":"We utilize application trends analysis, focused on webpage content, in order to examine the design of mobile computers more holistically. We find that both Internet bandwidth and processing local to the computing device is being wasted by re-transmission of formatting data. By taking this broader view, and separating Macromedia Flash content into raw data and its packaging, we show that performance can be increased by 84%, power consumption can be decreased by 71%, and communications bandwidth can be saved by an order of magnitude.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126638163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}