Huaxi Gu, Mo Kwai Hung Morton, Jiang Xu, Wei Zhang
Networks-on-chip (NoCs) can improve the communication bandwidth and power efficiency of multiprocessor systems-on-chip (MPSoC). However, traditional metallic interconnects consume significant amount of power to deliver even higher communication bandwidth required in the near future. Optical NoCs are based on optical interconnects and optical routers, and have significant bandwidth and power advantages. This paper proposed a high-performance low-power low-cost optical router, Cygnus, for optical NoCs. Cygnus is non-blocking and based on silicon microresonators. We compared Cygnus with other microresonator-based routers, and analyzed their power consumption, optical power insertion loss, and the number of microresonators used in detail. The results show that Cygnus has the lowest power consumption and losses, and requires the lowest number of microresonators. For example, Cygnus has 50% less power consumption, 51% less optical power insertion loss, and 20% less microresonators than the optimized traditional optical crossbar router. Comparing to a high-performance 45nm electronic router, Cygnus consumes 96% less power. Moreover, the passive routing feature of Cygnus guarantees that, while using dimension order routing algorithm, the maximum power consumption to route a packet through a network is a small constant number, regardless of the network size. For example, the maximum power consumption is 4.80fJ/bit under current technologies. We simulated and analyzed an 8x8 2D mesh NoC built from Cygnus and showed the end-to-end delay and network throughput under different offered loads and packet sizes.
{"title":"A Low-power Low-cost Optical Router for Optical Networks-on-Chip in Multiprocessor Systems-on-Chip","authors":"Huaxi Gu, Mo Kwai Hung Morton, Jiang Xu, Wei Zhang","doi":"10.1109/ISVLSI.2009.19","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.19","url":null,"abstract":"Networks-on-chip (NoCs) can improve the communication bandwidth and power efficiency of multiprocessor systems-on-chip (MPSoC). However, traditional metallic interconnects consume significant amount of power to deliver even higher communication bandwidth required in the near future. Optical NoCs are based on optical interconnects and optical routers, and have significant bandwidth and power advantages. This paper proposed a high-performance low-power low-cost optical router, Cygnus, for optical NoCs. Cygnus is non-blocking and based on silicon microresonators. We compared Cygnus with other microresonator-based routers, and analyzed their power consumption, optical power insertion loss, and the number of microresonators used in detail. The results show that Cygnus has the lowest power consumption and losses, and requires the lowest number of microresonators. For example, Cygnus has 50% less power consumption, 51% less optical power insertion loss, and 20% less microresonators than the optimized traditional optical crossbar router. Comparing to a high-performance 45nm electronic router, Cygnus consumes 96% less power. Moreover, the passive routing feature of Cygnus guarantees that, while using dimension order routing algorithm, the maximum power consumption to route a packet through a network is a small constant number, regardless of the network size. For example, the maximum power consumption is 4.80fJ/bit under current technologies. We simulated and analyzed an 8x8 2D mesh NoC built from Cygnus and showed the end-to-end delay and network throughput under different offered loads and packet sizes.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"165 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115488711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dynamic power dissipation of a CMOS VLSI circuit depends on the signal activity at gate outputs. The activity includes the steady-state logic transitions as well as glitches. The latter are a function of gate delays, which, for modern VLSI circuits, have wide process-related variations. Both average and peak power dissipation are useful and are traditionally estimated by Monte Carlo simulation. This is expensive and the accuracy, especially for peak power,depends upon the number of circuit delay samples that are simulated. We present an alternative. We use zero-delay simulation of a vector pair to determine the steady-state logic activity. We derive linear-time algorithms that, using delay bounds for gates, determine the maximum, minimum and average number of transitions that each gate output can produce. From this information, we estimate the average and peak energy consumed by each vector pair in a given vector set. For a set of random vectors applied to c7552 circuit, our analysis determined the per-vector energy consumption as 82.2 picojoules average and 196.3 picojoules peak. In comparison, Monte Carlo simulation of 1,000 circuit samples gave 82.8 picojoules average and 146.1 picojoules peak. The discrepancy of the peak consumption will reduce if more samples were simulated in the Monte Carlo method. Even with 1,000 samples the CPU time of the Monte Carlo analysis was three orders of magnitude greater than the alternative method we offer in this paper.
{"title":"Algorithms for Estimating Number of Glitches and Dynamic Power in CMOS Circuits with Delay Variations","authors":"Jins D. Alexander, V. Agrawal","doi":"10.1109/ISVLSI.2009.57","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.57","url":null,"abstract":"Dynamic power dissipation of a CMOS VLSI circuit depends on the signal activity at gate outputs. The activity includes the steady-state logic transitions as well as glitches. The latter are a function of gate delays, which, for modern VLSI circuits, have wide process-related variations. Both average and peak power dissipation are useful and are traditionally estimated by Monte Carlo simulation. This is expensive and the accuracy, especially for peak power,depends upon the number of circuit delay samples that are simulated. We present an alternative. We use zero-delay simulation of a vector pair to determine the steady-state logic activity. We derive linear-time algorithms that, using delay bounds for gates, determine the maximum, minimum and average number of transitions that each gate output can produce. From this information, we estimate the average and peak energy consumed by each vector pair in a given vector set. For a set of random vectors applied to c7552 circuit, our analysis determined the per-vector energy consumption as 82.2 picojoules average and 196.3 picojoules peak. In comparison, Monte Carlo simulation of 1,000 circuit samples gave 82.8 picojoules average and 146.1 picojoules peak. The discrepancy of the peak consumption will reduce if more samples were simulated in the Monte Carlo method. Even with 1,000 samples the CPU time of the Monte Carlo analysis was three orders of magnitude greater than the alternative method we offer in this paper.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"176 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120954213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reuse of Intellectual Property (IP) of VLSI physical design facilitates integration of more components on a single chip in shrinking time-to-market. For intellectual property protection (IPP), various kinds of IP marks are embedded into the design for establishing the veracity of a legal owner. However, public verification of IP marks is not leakage-proof. Current techniques include a sufficiently large set of public marks containing a header and a message body in addition to private ones to facilitate only public verification at the cost of significant increase in design overhead. But these techniques are not effective, as attackers manage to obtain potential clues to tamper public marks rendering public verification invalid and may also suitably override the marks to include own signature resulting in wrong public identification of IP owner. Here we propose a zero-knowledge protocol to ensure robust and absolutely leakage proof convincing public verification with the help of private marks. We have tested our protocol for FPGA benchmarks. The results on overhead and robustness are encouraging.
{"title":"Secure Leakage-Proof Public Verification of IP Marks in VLSI Physical Design","authors":"Debasri Saha, S. Sur-Kolay","doi":"10.1109/ISVLSI.2009.35","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.35","url":null,"abstract":"Reuse of Intellectual Property (IP) of VLSI physical design facilitates integration of more components on a single chip in shrinking time-to-market. For intellectual property protection (IPP), various kinds of IP marks are embedded into the design for establishing the veracity of a legal owner. However, public verification of IP marks is not leakage-proof. Current techniques include a sufficiently large set of public marks containing a header and a message body in addition to private ones to facilitate only public verification at the cost of significant increase in design overhead. But these techniques are not effective, as attackers manage to obtain potential clues to tamper public marks rendering public verification invalid and may also suitably override the marks to include own signature resulting in wrong public identification of IP owner. Here we propose a zero-knowledge protocol to ensure robust and absolutely leakage proof convincing public verification with the help of private marks. We have tested our protocol for FPGA benchmarks. The results on overhead and robustness are encouraging.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125627762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Chu, Rung-Bin Lin, Da-Wei Hsu, Yu-Hsing Chen, Wei-Chiu Tseng
Effective algorithms have been invented for post-routing redundant via insertion (RVI). However, implementations of these algorithms often ignore some practical issues. In this article, we implement a post-routing RVI algorithm that takes into account interconnect contexts during RVI. Experimental results show that our context-aware RVI on average raises via1 (vias between metal layer 1 and 2) insertion rate from 37.4% to 72.1% and total insertion rate from 72.5% to 85.8%. On average, it increases RVI rate of critical paths by 3.6%. Besides, with redundant pin-area minimization, our approach reduces metal 1 and metal 2 area used for RVI at pins by 3%.
{"title":"Context-aware Post Routing Redundant Via Insertion","authors":"P. Chu, Rung-Bin Lin, Da-Wei Hsu, Yu-Hsing Chen, Wei-Chiu Tseng","doi":"10.1109/ISVLSI.2009.39","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.39","url":null,"abstract":"Effective algorithms have been invented for post-routing redundant via insertion (RVI). However, implementations of these algorithms often ignore some practical issues. In this article, we implement a post-routing RVI algorithm that takes into account interconnect contexts during RVI. Experimental results show that our context-aware RVI on average raises via1 (vias between metal layer 1 and 2) insertion rate from 37.4% to 72.1% and total insertion rate from 72.5% to 85.8%. On average, it increases RVI rate of critical paths by 3.6%. Besides, with redundant pin-area minimization, our approach reduces metal 1 and metal 2 area used for RVI at pins by 3%.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"92 11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128019611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the development of SOC designs, modern floorplanning typically needs to provide extra options to meet the different emerging requirements in the hierarchical designs, such as boundary constraint for I/O connection, clustering constraint for performance and reliability, etc. This paper addresses modern floorplanning with boundary clustering constraint. It has been empirically shown that the modern constraints extremely restrict the solution space; that is, a large number of randomly generated floorplans might be infeasible. In order to effectively search the feasible solutions, the feasible conditions based on B*-tree representation with boundary clustering constraint are investigated. The properties, coupled with an efficient simulated annealing algorithm, provide the way to produce feasible floorplans by dynamic repairing, which can transform an infeasible solution into a feasible one if the constraint is violated. Our algorithm is verified by using the MCNC and GSRC benchmarks, and the empirical results show that our algorithm can obtain promising solutions in acceptable time
{"title":"Modern Floorplanning with Boundary Clustering Constraint","authors":"Li Li, Yuchun Ma, N. Xu, Yu Wang, Xianlong Hong","doi":"10.1109/ISVLSI.2009.24","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.24","url":null,"abstract":"With the development of SOC designs, modern floorplanning typically needs to provide extra options to meet the different emerging requirements in the hierarchical designs, such as boundary constraint for I/O connection, clustering constraint for performance and reliability, etc. This paper addresses modern floorplanning with boundary clustering constraint. It has been empirically shown that the modern constraints extremely restrict the solution space; that is, a large number of randomly generated floorplans might be infeasible. In order to effectively search the feasible solutions, the feasible conditions based on B*-tree representation with boundary clustering constraint are investigated. The properties, coupled with an efficient simulated annealing algorithm, provide the way to produce feasible floorplans by dynamic repairing, which can transform an infeasible solution into a feasible one if the constraint is violated. Our algorithm is verified by using the MCNC and GSRC benchmarks, and the empirical results show that our algorithm can obtain promising solutions in acceptable time","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129453950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Fazeel, L. Raghavan, Chandrasekaran Srinivasaraman, Manish Jain
Low static phase offset is desired in Phase Locked Loops (PLL) employed in high speed I/O interfaces and frequency synthesizers. In this work, non idealities in phase frequency detector and charge pump contributing to static phase offset have been studied and their relative contributions analyzed in detail. A new charge pump architecture with reduced mismatch between Up and Dn current sources has been presented. It makes use of a single two stage amplifier for both current steering and reduction of mismatch. The efficacy of this architecture has been demonstrated with simulation results on a PLL running at an input reference frequency of 500MHz in65nm CMOS technology.
{"title":"Reduction of Current Mismatch in PLL Charge Pump","authors":"H. Fazeel, L. Raghavan, Chandrasekaran Srinivasaraman, Manish Jain","doi":"10.1109/ISVLSI.2009.45","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.45","url":null,"abstract":"Low static phase offset is desired in Phase Locked Loops (PLL) employed in high speed I/O interfaces and frequency synthesizers. In this work, non idealities in phase frequency detector and charge pump contributing to static phase offset have been studied and their relative contributions analyzed in detail. A new charge pump architecture with reduced mismatch between Up and Dn current sources has been presented. It makes use of a single two stage amplifier for both current steering and reduction of mismatch. The efficacy of this architecture has been demonstrated with simulation results on a PLL running at an input reference frequency of 500MHz in65nm CMOS technology.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":" 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120834021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As technology scales, leakage power shares a dominant part in the total power dissipation of the chip and reaches up to 50% or even higher at elevated temperatures in 45 nm technology. Leakage power dissipation is especially problematic for FPGAs due to their reconfigurable nature and large number of inactive resources. Body biasing is an efficient technique to reduce leakage current which has been widely adopted in 45nm technology low power architectures.FPGAs with coarse grained body bias control only incurred about 10% of the area overhead while increasing the granularity to the finest level dramatically increases the area overhead over 100%. However, the coarse grained body bias control FPGA may not result in satisfactory leakage power reduction since all the paths passing a resource must have enough slacks. To overcome the assignment limitation, we propose a novel FPGA architecture which uses body biasing technique and clock skew scheduling at a coarse grained architecture level. Clock skew scheduling technique only incurs 3.35% of additional area overhead in order to distribute slack to the resource instead of increasing the minimum body-bias granularity. Further, we propose a body bias assignment algorithm to leverage the proposed architecture. Experimental results demonstrate that the proposed architecture achieved an average leakage reduction of about 76% as compared to 61% of coarse grained architecture.
{"title":"A Novel Low Area Overhead Body Bias FPGA Architecture for Low Power Applications","authors":"Sungmin Bae, K. Ramakrishnan, N. Vijaykrishnan","doi":"10.1109/ISVLSI.2009.51","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.51","url":null,"abstract":"As technology scales, leakage power shares a dominant part in the total power dissipation of the chip and reaches up to 50% or even higher at elevated temperatures in 45 nm technology. Leakage power dissipation is especially problematic for FPGAs due to their reconfigurable nature and large number of inactive resources. Body biasing is an efficient technique to reduce leakage current which has been widely adopted in 45nm technology low power architectures.FPGAs with coarse grained body bias control only incurred about 10% of the area overhead while increasing the granularity to the finest level dramatically increases the area overhead over 100%. However, the coarse grained body bias control FPGA may not result in satisfactory leakage power reduction since all the paths passing a resource must have enough slacks. To overcome the assignment limitation, we propose a novel FPGA architecture which uses body biasing technique and clock skew scheduling at a coarse grained architecture level. Clock skew scheduling technique only incurs 3.35% of additional area overhead in order to distribute slack to the resource instead of increasing the minimum body-bias granularity. Further, we propose a body bias assignment algorithm to leverage the proposed architecture. Experimental results demonstrate that the proposed architecture achieved an average leakage reduction of about 76% as compared to 61% of coarse grained architecture.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115133579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Taecheol Oh, Hyunjin Lee, Kiyeon Lee, Sangyeun Cho
A key design issue for chip multiprocessors (CMPs) is how to exploit the finite chip area to get the best system throughput.The most dominant area-consuming components in a CMP are processor cores and caches today.There is an important trade-off between the number of cores and the amount of cache in a single CMP chip.If we have too few cores, the system throughput will be limited by the number of threads.If we have too small cache capacity, the system may perform poorly due to frequent cache misses.This paper presents a simple and effective analytical model to study the trade-off of the core count and the cache capacity in a CMP under a finite die area constraint.Our model differentiates shared, private, and hybrid cache organizations.Our work will complement more detailed yet time-consuming simulation approaches by enabling one to quickly study how key chip area allocation parameters affect the system performance.
{"title":"An Analytical Model to Study Optimal Area Breakdown between Cores and Caches in a Chip Multiprocessor","authors":"Taecheol Oh, Hyunjin Lee, Kiyeon Lee, Sangyeun Cho","doi":"10.1109/ISVLSI.2009.27","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.27","url":null,"abstract":"A key design issue for chip multiprocessors (CMPs) is how to exploit the finite chip area to get the best system throughput.The most dominant area-consuming components in a CMP are processor cores and caches today.There is an important trade-off between the number of cores and the amount of cache in a single CMP chip.If we have too few cores, the system throughput will be limited by the number of threads.If we have too small cache capacity, the system may perform poorly due to frequent cache misses.This paper presents a simple and effective analytical model to study the trade-off of the core count and the cache capacity in a CMP under a finite die area constraint.Our model differentiates shared, private, and hybrid cache organizations.Our work will complement more detailed yet time-consuming simulation approaches by enabling one to quickly study how key chip area allocation parameters affect the system performance.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117041547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A duty cycle correction circuit (DCC) for high frequency clocks with fine resolution is designed and tested at 1.2V in 90nm CMOS process. Spice simulations show that this duty cycle corrector can adjust the output duty cycle to 50±0.5% with input clock at 500MHz and input duty cycle ranging from20% to 80%. DCC will not introduce any delay in the forward path, which makes it suitable for multi-phase clock applications. The proposed implementation uses the high frequency delay line and MUTEX (Mutual Exclusion Element) based circuit for achieving high resolution.
{"title":"All Digital Duty Cycle Correction Circuit in 90nm Based on Mutex","authors":"S. Ramasahayam, M. Srinivas","doi":"10.1109/ISVLSI.2009.41","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.41","url":null,"abstract":"A duty cycle correction circuit (DCC) for high frequency clocks with fine resolution is designed and tested at 1.2V in 90nm CMOS process. Spice simulations show that this duty cycle corrector can adjust the output duty cycle to 50±0.5% with input clock at 500MHz and input duty cycle ranging from20% to 80%. DCC will not introduce any delay in the forward path, which makes it suitable for multi-phase clock applications. The proposed implementation uses the high frequency delay line and MUTEX (Mutual Exclusion Element) based circuit for achieving high resolution.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"54 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128078197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lossless compression is widely used to improve both memory requirement and communication bandwidth in embedded systems. Dictionary based compression techniques are very popular because of their good compression efficiency and fast decompression mechanism. Bitmask based compression improves the effectiveness of the dictionary based approaches by recording minor differences using bitmasks. This paper proposes an efficient encoding of bitmasks used in bitmask-based compression. We prove that a n-bit bitmask (records n differences) can be encoded using only n-1 bits. This encoding improves compression efficiency while reduces decompression hardware overhead. We have applied our approach in a wide a variety of domains including code compression, FPGA bitstream compression as well as control word compression. Our experimental results using a wide variety of benchmarks demonstrate that our approach improves the compression efficiency by 3 to 10% without adding any additional decompression overhead.
{"title":"Lossless Compression Using Efficient Encoding of Bitmasks","authors":"C. Murthy, P. Mishra","doi":"10.1109/ISVLSI.2009.18","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.18","url":null,"abstract":"Lossless compression is widely used to improve both memory requirement and communication bandwidth in embedded systems. Dictionary based compression techniques are very popular because of their good compression efficiency and fast decompression mechanism. Bitmask based compression improves the effectiveness of the dictionary based approaches by recording minor differences using bitmasks. This paper proposes an efficient encoding of bitmasks used in bitmask-based compression. We prove that a n-bit bitmask (records n differences) can be encoded using only n-1 bits. This encoding improves compression efficiency while reduces decompression hardware overhead. We have applied our approach in a wide a variety of domains including code compression, FPGA bitstream compression as well as control word compression. Our experimental results using a wide variety of benchmarks demonstrate that our approach improves the compression efficiency by 3 to 10% without adding any additional decompression overhead.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"173 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131181085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}