Dynamic power dissipation of a CMOS VLSI circuit depends on the signal activity at gate outputs. The activity includes the steady-state logic transitions as well as glitches. The latter are a function of gate delays, which, for modern VLSI circuits, have wide process-related variations. Both average and peak power dissipation are useful and are traditionally estimated by Monte Carlo simulation. This is expensive and the accuracy, especially for peak power,depends upon the number of circuit delay samples that are simulated. We present an alternative. We use zero-delay simulation of a vector pair to determine the steady-state logic activity. We derive linear-time algorithms that, using delay bounds for gates, determine the maximum, minimum and average number of transitions that each gate output can produce. From this information, we estimate the average and peak energy consumed by each vector pair in a given vector set. For a set of random vectors applied to c7552 circuit, our analysis determined the per-vector energy consumption as 82.2 picojoules average and 196.3 picojoules peak. In comparison, Monte Carlo simulation of 1,000 circuit samples gave 82.8 picojoules average and 146.1 picojoules peak. The discrepancy of the peak consumption will reduce if more samples were simulated in the Monte Carlo method. Even with 1,000 samples the CPU time of the Monte Carlo analysis was three orders of magnitude greater than the alternative method we offer in this paper.
{"title":"Algorithms for Estimating Number of Glitches and Dynamic Power in CMOS Circuits with Delay Variations","authors":"Jins D. Alexander, V. Agrawal","doi":"10.1109/ISVLSI.2009.57","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.57","url":null,"abstract":"Dynamic power dissipation of a CMOS VLSI circuit depends on the signal activity at gate outputs. The activity includes the steady-state logic transitions as well as glitches. The latter are a function of gate delays, which, for modern VLSI circuits, have wide process-related variations. Both average and peak power dissipation are useful and are traditionally estimated by Monte Carlo simulation. This is expensive and the accuracy, especially for peak power,depends upon the number of circuit delay samples that are simulated. We present an alternative. We use zero-delay simulation of a vector pair to determine the steady-state logic activity. We derive linear-time algorithms that, using delay bounds for gates, determine the maximum, minimum and average number of transitions that each gate output can produce. From this information, we estimate the average and peak energy consumed by each vector pair in a given vector set. For a set of random vectors applied to c7552 circuit, our analysis determined the per-vector energy consumption as 82.2 picojoules average and 196.3 picojoules peak. In comparison, Monte Carlo simulation of 1,000 circuit samples gave 82.8 picojoules average and 146.1 picojoules peak. The discrepancy of the peak consumption will reduce if more samples were simulated in the Monte Carlo method. Even with 1,000 samples the CPU time of the Monte Carlo analysis was three orders of magnitude greater than the alternative method we offer in this paper.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"176 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120954213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reuse of Intellectual Property (IP) of VLSI physical design facilitates integration of more components on a single chip in shrinking time-to-market. For intellectual property protection (IPP), various kinds of IP marks are embedded into the design for establishing the veracity of a legal owner. However, public verification of IP marks is not leakage-proof. Current techniques include a sufficiently large set of public marks containing a header and a message body in addition to private ones to facilitate only public verification at the cost of significant increase in design overhead. But these techniques are not effective, as attackers manage to obtain potential clues to tamper public marks rendering public verification invalid and may also suitably override the marks to include own signature resulting in wrong public identification of IP owner. Here we propose a zero-knowledge protocol to ensure robust and absolutely leakage proof convincing public verification with the help of private marks. We have tested our protocol for FPGA benchmarks. The results on overhead and robustness are encouraging.
{"title":"Secure Leakage-Proof Public Verification of IP Marks in VLSI Physical Design","authors":"Debasri Saha, S. Sur-Kolay","doi":"10.1109/ISVLSI.2009.35","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.35","url":null,"abstract":"Reuse of Intellectual Property (IP) of VLSI physical design facilitates integration of more components on a single chip in shrinking time-to-market. For intellectual property protection (IPP), various kinds of IP marks are embedded into the design for establishing the veracity of a legal owner. However, public verification of IP marks is not leakage-proof. Current techniques include a sufficiently large set of public marks containing a header and a message body in addition to private ones to facilitate only public verification at the cost of significant increase in design overhead. But these techniques are not effective, as attackers manage to obtain potential clues to tamper public marks rendering public verification invalid and may also suitably override the marks to include own signature resulting in wrong public identification of IP owner. Here we propose a zero-knowledge protocol to ensure robust and absolutely leakage proof convincing public verification with the help of private marks. We have tested our protocol for FPGA benchmarks. The results on overhead and robustness are encouraging.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125627762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lithography related CD variations, fluctuations in dopant density, oxide thickness and parametric variations of devices are identified as major challenges in ITRS. Due to growth in size of embedded SRAMs as well as usage of sense amplifier based signaling techniques, process variation in sense amplifiers lead to significant loss of yield. In this paper, we present a process variation tolerant self-compensating sense amplifier design, using an active compensation circuitry. Results from statistical simulation in a 32nm process show that the proposed active compensation is highly effective in restoring yield at a level comparable to that of sense amplifiers without significant process variations.
{"title":"A Process Variation Tolerant Self-Compensating Sense Amplifier Design","authors":"A. Choudhary, S. Kundu","doi":"10.1109/ISVLSI.2009.50","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.50","url":null,"abstract":"Lithography related CD variations, fluctuations in dopant density, oxide thickness and parametric variations of devices are identified as major challenges in ITRS. Due to growth in size of embedded SRAMs as well as usage of sense amplifier based signaling techniques, process variation in sense amplifiers lead to significant loss of yield. In this paper, we present a process variation tolerant self-compensating sense amplifier design, using an active compensation circuitry. Results from statistical simulation in a 32nm process show that the proposed active compensation is highly effective in restoring yield at a level comparable to that of sense amplifiers without significant process variations.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123055566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Chu, Rung-Bin Lin, Da-Wei Hsu, Yu-Hsing Chen, Wei-Chiu Tseng
Effective algorithms have been invented for post-routing redundant via insertion (RVI). However, implementations of these algorithms often ignore some practical issues. In this article, we implement a post-routing RVI algorithm that takes into account interconnect contexts during RVI. Experimental results show that our context-aware RVI on average raises via1 (vias between metal layer 1 and 2) insertion rate from 37.4% to 72.1% and total insertion rate from 72.5% to 85.8%. On average, it increases RVI rate of critical paths by 3.6%. Besides, with redundant pin-area minimization, our approach reduces metal 1 and metal 2 area used for RVI at pins by 3%.
{"title":"Context-aware Post Routing Redundant Via Insertion","authors":"P. Chu, Rung-Bin Lin, Da-Wei Hsu, Yu-Hsing Chen, Wei-Chiu Tseng","doi":"10.1109/ISVLSI.2009.39","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.39","url":null,"abstract":"Effective algorithms have been invented for post-routing redundant via insertion (RVI). However, implementations of these algorithms often ignore some practical issues. In this article, we implement a post-routing RVI algorithm that takes into account interconnect contexts during RVI. Experimental results show that our context-aware RVI on average raises via1 (vias between metal layer 1 and 2) insertion rate from 37.4% to 72.1% and total insertion rate from 72.5% to 85.8%. On average, it increases RVI rate of critical paths by 3.6%. Besides, with redundant pin-area minimization, our approach reduces metal 1 and metal 2 area used for RVI at pins by 3%.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"92 11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128019611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the development of SOC designs, modern floorplanning typically needs to provide extra options to meet the different emerging requirements in the hierarchical designs, such as boundary constraint for I/O connection, clustering constraint for performance and reliability, etc. This paper addresses modern floorplanning with boundary clustering constraint. It has been empirically shown that the modern constraints extremely restrict the solution space; that is, a large number of randomly generated floorplans might be infeasible. In order to effectively search the feasible solutions, the feasible conditions based on B*-tree representation with boundary clustering constraint are investigated. The properties, coupled with an efficient simulated annealing algorithm, provide the way to produce feasible floorplans by dynamic repairing, which can transform an infeasible solution into a feasible one if the constraint is violated. Our algorithm is verified by using the MCNC and GSRC benchmarks, and the empirical results show that our algorithm can obtain promising solutions in acceptable time
{"title":"Modern Floorplanning with Boundary Clustering Constraint","authors":"Li Li, Yuchun Ma, N. Xu, Yu Wang, Xianlong Hong","doi":"10.1109/ISVLSI.2009.24","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.24","url":null,"abstract":"With the development of SOC designs, modern floorplanning typically needs to provide extra options to meet the different emerging requirements in the hierarchical designs, such as boundary constraint for I/O connection, clustering constraint for performance and reliability, etc. This paper addresses modern floorplanning with boundary clustering constraint. It has been empirically shown that the modern constraints extremely restrict the solution space; that is, a large number of randomly generated floorplans might be infeasible. In order to effectively search the feasible solutions, the feasible conditions based on B*-tree representation with boundary clustering constraint are investigated. The properties, coupled with an efficient simulated annealing algorithm, provide the way to produce feasible floorplans by dynamic repairing, which can transform an infeasible solution into a feasible one if the constraint is violated. Our algorithm is verified by using the MCNC and GSRC benchmarks, and the empirical results show that our algorithm can obtain promising solutions in acceptable time","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129453950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Fazeel, L. Raghavan, Chandrasekaran Srinivasaraman, Manish Jain
Low static phase offset is desired in Phase Locked Loops (PLL) employed in high speed I/O interfaces and frequency synthesizers. In this work, non idealities in phase frequency detector and charge pump contributing to static phase offset have been studied and their relative contributions analyzed in detail. A new charge pump architecture with reduced mismatch between Up and Dn current sources has been presented. It makes use of a single two stage amplifier for both current steering and reduction of mismatch. The efficacy of this architecture has been demonstrated with simulation results on a PLL running at an input reference frequency of 500MHz in65nm CMOS technology.
{"title":"Reduction of Current Mismatch in PLL Charge Pump","authors":"H. Fazeel, L. Raghavan, Chandrasekaran Srinivasaraman, Manish Jain","doi":"10.1109/ISVLSI.2009.45","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.45","url":null,"abstract":"Low static phase offset is desired in Phase Locked Loops (PLL) employed in high speed I/O interfaces and frequency synthesizers. In this work, non idealities in phase frequency detector and charge pump contributing to static phase offset have been studied and their relative contributions analyzed in detail. A new charge pump architecture with reduced mismatch between Up and Dn current sources has been presented. It makes use of a single two stage amplifier for both current steering and reduction of mismatch. The efficacy of this architecture has been demonstrated with simulation results on a PLL running at an input reference frequency of 500MHz in65nm CMOS technology.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":" 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120834021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As technology scales, leakage power shares a dominant part in the total power dissipation of the chip and reaches up to 50% or even higher at elevated temperatures in 45 nm technology. Leakage power dissipation is especially problematic for FPGAs due to their reconfigurable nature and large number of inactive resources. Body biasing is an efficient technique to reduce leakage current which has been widely adopted in 45nm technology low power architectures.FPGAs with coarse grained body bias control only incurred about 10% of the area overhead while increasing the granularity to the finest level dramatically increases the area overhead over 100%. However, the coarse grained body bias control FPGA may not result in satisfactory leakage power reduction since all the paths passing a resource must have enough slacks. To overcome the assignment limitation, we propose a novel FPGA architecture which uses body biasing technique and clock skew scheduling at a coarse grained architecture level. Clock skew scheduling technique only incurs 3.35% of additional area overhead in order to distribute slack to the resource instead of increasing the minimum body-bias granularity. Further, we propose a body bias assignment algorithm to leverage the proposed architecture. Experimental results demonstrate that the proposed architecture achieved an average leakage reduction of about 76% as compared to 61% of coarse grained architecture.
{"title":"A Novel Low Area Overhead Body Bias FPGA Architecture for Low Power Applications","authors":"Sungmin Bae, K. Ramakrishnan, N. Vijaykrishnan","doi":"10.1109/ISVLSI.2009.51","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.51","url":null,"abstract":"As technology scales, leakage power shares a dominant part in the total power dissipation of the chip and reaches up to 50% or even higher at elevated temperatures in 45 nm technology. Leakage power dissipation is especially problematic for FPGAs due to their reconfigurable nature and large number of inactive resources. Body biasing is an efficient technique to reduce leakage current which has been widely adopted in 45nm technology low power architectures.FPGAs with coarse grained body bias control only incurred about 10% of the area overhead while increasing the granularity to the finest level dramatically increases the area overhead over 100%. However, the coarse grained body bias control FPGA may not result in satisfactory leakage power reduction since all the paths passing a resource must have enough slacks. To overcome the assignment limitation, we propose a novel FPGA architecture which uses body biasing technique and clock skew scheduling at a coarse grained architecture level. Clock skew scheduling technique only incurs 3.35% of additional area overhead in order to distribute slack to the resource instead of increasing the minimum body-bias granularity. Further, we propose a body bias assignment algorithm to leverage the proposed architecture. Experimental results demonstrate that the proposed architecture achieved an average leakage reduction of about 76% as compared to 61% of coarse grained architecture.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115133579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Taecheol Oh, Hyunjin Lee, Kiyeon Lee, Sangyeun Cho
A key design issue for chip multiprocessors (CMPs) is how to exploit the finite chip area to get the best system throughput.The most dominant area-consuming components in a CMP are processor cores and caches today.There is an important trade-off between the number of cores and the amount of cache in a single CMP chip.If we have too few cores, the system throughput will be limited by the number of threads.If we have too small cache capacity, the system may perform poorly due to frequent cache misses.This paper presents a simple and effective analytical model to study the trade-off of the core count and the cache capacity in a CMP under a finite die area constraint.Our model differentiates shared, private, and hybrid cache organizations.Our work will complement more detailed yet time-consuming simulation approaches by enabling one to quickly study how key chip area allocation parameters affect the system performance.
{"title":"An Analytical Model to Study Optimal Area Breakdown between Cores and Caches in a Chip Multiprocessor","authors":"Taecheol Oh, Hyunjin Lee, Kiyeon Lee, Sangyeun Cho","doi":"10.1109/ISVLSI.2009.27","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.27","url":null,"abstract":"A key design issue for chip multiprocessors (CMPs) is how to exploit the finite chip area to get the best system throughput.The most dominant area-consuming components in a CMP are processor cores and caches today.There is an important trade-off between the number of cores and the amount of cache in a single CMP chip.If we have too few cores, the system throughput will be limited by the number of threads.If we have too small cache capacity, the system may perform poorly due to frequent cache misses.This paper presents a simple and effective analytical model to study the trade-off of the core count and the cache capacity in a CMP under a finite die area constraint.Our model differentiates shared, private, and hybrid cache organizations.Our work will complement more detailed yet time-consuming simulation approaches by enabling one to quickly study how key chip area allocation parameters affect the system performance.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117041547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yan Xu, Weichen Liu, Yu Wang, Jiang Xu, Xiaoming Chen, Huazhong Yang
Power gating induced power/ground(P/G) noise is a major reliability problem facing by low power MPSoCs using power gating techniques. Powering on and off a process unit in MPSoCs will induce large P/G noise and can cause timing divergence and even functional errors in surrounding processing units. P/G noise is different from thermal or energy which is an accumulative effect. The noise level should be predicted and victim circuits should be protected before the noise is induced. hence, the power gating-aware scheduling problem with the consideration of P/G noise should be solved using an on-line method considering the run-time variation of tasks' execution time. In this paper, we formulate an on-line task scheduling problem with the consideration of P/G noise based on our detailed P/G noise analysis platform for MPSoC. An efficient on-line Greedy Heuristic (GH) algorithm that adapts well to real-time variation is proposed to reduce noise protection penalty and improve MPSoC performance. Our experiments show that the algorithm can achieve an average 26% performance improvement together with an average 73% noise protection penalty saving compared with the conservative stop-go method. We also compare our technique with a two-step solution that computes a static schedule at compile time and make adjustment on the schedule according to runtime variations. For benchmark with larger task number, GH method achieves impressive performance improvement comparing with the two-step solution.
{"title":"On-line MPSoC Scheduling Considering Power Gating Induced Power/Ground Noise","authors":"Yan Xu, Weichen Liu, Yu Wang, Jiang Xu, Xiaoming Chen, Huazhong Yang","doi":"10.1109/ISVLSI.2009.54","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.54","url":null,"abstract":"Power gating induced power/ground(P/G) noise is a major reliability problem facing by low power MPSoCs using power gating techniques. Powering on and off a process unit in MPSoCs will induce large P/G noise and can cause timing divergence and even functional errors in surrounding processing units. P/G noise is different from thermal or energy which is an accumulative effect. The noise level should be predicted and victim circuits should be protected before the noise is induced. hence, the power gating-aware scheduling problem with the consideration of P/G noise should be solved using an on-line method considering the run-time variation of tasks' execution time. In this paper, we formulate an on-line task scheduling problem with the consideration of P/G noise based on our detailed P/G noise analysis platform for MPSoC. An efficient on-line Greedy Heuristic (GH) algorithm that adapts well to real-time variation is proposed to reduce noise protection penalty and improve MPSoC performance. Our experiments show that the algorithm can achieve an average 26% performance improvement together with an average 73% noise protection penalty saving compared with the conservative stop-go method. We also compare our technique with a two-step solution that computes a static schedule at compile time and make adjustment on the schedule according to runtime variations. For benchmark with larger task number, GH method achieves impressive performance improvement comparing with the two-step solution.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132230358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lossless compression is widely used to improve both memory requirement and communication bandwidth in embedded systems. Dictionary based compression techniques are very popular because of their good compression efficiency and fast decompression mechanism. Bitmask based compression improves the effectiveness of the dictionary based approaches by recording minor differences using bitmasks. This paper proposes an efficient encoding of bitmasks used in bitmask-based compression. We prove that a n-bit bitmask (records n differences) can be encoded using only n-1 bits. This encoding improves compression efficiency while reduces decompression hardware overhead. We have applied our approach in a wide a variety of domains including code compression, FPGA bitstream compression as well as control word compression. Our experimental results using a wide variety of benchmarks demonstrate that our approach improves the compression efficiency by 3 to 10% without adding any additional decompression overhead.
{"title":"Lossless Compression Using Efficient Encoding of Bitmasks","authors":"C. Murthy, P. Mishra","doi":"10.1109/ISVLSI.2009.18","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.18","url":null,"abstract":"Lossless compression is widely used to improve both memory requirement and communication bandwidth in embedded systems. Dictionary based compression techniques are very popular because of their good compression efficiency and fast decompression mechanism. Bitmask based compression improves the effectiveness of the dictionary based approaches by recording minor differences using bitmasks. This paper proposes an efficient encoding of bitmasks used in bitmask-based compression. We prove that a n-bit bitmask (records n differences) can be encoded using only n-1 bits. This encoding improves compression efficiency while reduces decompression hardware overhead. We have applied our approach in a wide a variety of domains including code compression, FPGA bitstream compression as well as control word compression. Our experimental results using a wide variety of benchmarks demonstrate that our approach improves the compression efficiency by 3 to 10% without adding any additional decompression overhead.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"173 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131181085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}