Pub Date : 2008-10-01DOI: 10.1109/ICCD.2008.4751877
Kai-Chiang Wu, Diana Marculescu
Nanoscale integrated circuits are becoming increasingly sensitive to radiation-induced transient faults (soft errors) due to current technology scaling trends, such as shrinking feature sizes and reducing supply voltages. Soft errors, which have been a significant concern in memories, are now a main factor in reliability degradation of logic circuits. This paper presents a power-aware methodology using dual supply voltages for soft error hardening. Given a constraint on power overhead, our proposed framework can minimize the soft error rate (SER) of a circuit via selective voltage scaling. On average, circuit SER can be reduced by 33.45% for various sizes of transient glitches with only 11.74% energy increase. The overhead in normalized power-delay-area product per 1% SER reduction is 0.64%, 1.33X less than that of existing state-of-the-art approaches.
{"title":"Power-aware soft error hardening via selective voltage scaling","authors":"Kai-Chiang Wu, Diana Marculescu","doi":"10.1109/ICCD.2008.4751877","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751877","url":null,"abstract":"Nanoscale integrated circuits are becoming increasingly sensitive to radiation-induced transient faults (soft errors) due to current technology scaling trends, such as shrinking feature sizes and reducing supply voltages. Soft errors, which have been a significant concern in memories, are now a main factor in reliability degradation of logic circuits. This paper presents a power-aware methodology using dual supply voltages for soft error hardening. Given a constraint on power overhead, our proposed framework can minimize the soft error rate (SER) of a circuit via selective voltage scaling. On average, circuit SER can be reduced by 33.45% for various sizes of transient glitches with only 11.74% energy increase. The overhead in normalized power-delay-area product per 1% SER reduction is 0.64%, 1.33X less than that of existing state-of-the-art approaches.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131620213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-01DOI: 10.1109/ICCD.2008.4751936
Sayaka Akioka, Feihui Li, K. Malkowski, P. Raghavan, M. Kandemir, M. J. Irwin
Increases in cache capacity are accompanied by growing wire delays due to technology scaling. Non-uniform cache architecture (NUCA) is one of proposed solutions to reducing the average access latency in such cache designs. While most of the prior NUCA work focuses on data placement, data replacement, and migration related issues, this paper studies the problem of data search (access) in NUCA. In our architecture we arrange sets of banks with equal access latency into rings. Our last access based (LAB) prediction scheme predicts the ring that is expected to contain the required data and checks the banks in that ring first for the data block sought. We compare our scheme to two alternate approaches: searching all rings in parallel, and searching rings sequentially. We show that our LAB ring prediction scheme reduces L2 energy significantly over the sequential and parallel schemes, while maintaining similar performance. Our LAB scheme reduces energy consumption by 15.9% relative to the sequential lookup scheme, and 53.8% relative to the parallel lookup scheme.
{"title":"Ring data location prediction scheme for Non-Uniform Cache Architectures","authors":"Sayaka Akioka, Feihui Li, K. Malkowski, P. Raghavan, M. Kandemir, M. J. Irwin","doi":"10.1109/ICCD.2008.4751936","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751936","url":null,"abstract":"Increases in cache capacity are accompanied by growing wire delays due to technology scaling. Non-uniform cache architecture (NUCA) is one of proposed solutions to reducing the average access latency in such cache designs. While most of the prior NUCA work focuses on data placement, data replacement, and migration related issues, this paper studies the problem of data search (access) in NUCA. In our architecture we arrange sets of banks with equal access latency into rings. Our last access based (LAB) prediction scheme predicts the ring that is expected to contain the required data and checks the banks in that ring first for the data block sought. We compare our scheme to two alternate approaches: searching all rings in parallel, and searching rings sequentially. We show that our LAB ring prediction scheme reduces L2 energy significantly over the sequential and parallel schemes, while maintaining similar performance. Our LAB scheme reduces energy consumption by 15.9% relative to the sequential lookup scheme, and 53.8% relative to the parallel lookup scheme.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133101310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-01DOI: 10.1109/ICCD.2008.4751919
Shuo Wang, Fan Zhang, Jianwei Dai, Lei Wang, Z. Shi
Power analysis attacks are a type of side-channel attacks that exploits the power consumption of computing devices to retrieve secret information. They are very effective in breaking many cryptographic algorithms, especially those running in low-end processors in embedded systems, sensor nodes, and smart cards. Although many countermeasures to power analysis attacks have been proposed, most of them are software based and designed for a specific algorithm. Many of them are also found vulnerable to more advanced attacks. Looking for a low-cost, algorithm-independent solution that can be implemented in many processors and makes all cryptographic algorithms secure against power analysis attacks, we start with register file, where the operands and results of most instructions are stored. In this paper, we propose RFRF, a register file that stores data with a redundant flipped copy. With the redundant copy and a new precharge phase in write operations, RFRF provides data-independent power consumption on read and write for cryptographic algorithms. Although RFRF has large energy overhead, it is only enabled in the security mode. We validate our method with simulations. The results show that the power consumption of RFRF is independent of the values read out from or written to registers. Thus RFRF can help mitigate power analysis attacks.
{"title":"Making register file resistant to power analysis attacks","authors":"Shuo Wang, Fan Zhang, Jianwei Dai, Lei Wang, Z. Shi","doi":"10.1109/ICCD.2008.4751919","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751919","url":null,"abstract":"Power analysis attacks are a type of side-channel attacks that exploits the power consumption of computing devices to retrieve secret information. They are very effective in breaking many cryptographic algorithms, especially those running in low-end processors in embedded systems, sensor nodes, and smart cards. Although many countermeasures to power analysis attacks have been proposed, most of them are software based and designed for a specific algorithm. Many of them are also found vulnerable to more advanced attacks. Looking for a low-cost, algorithm-independent solution that can be implemented in many processors and makes all cryptographic algorithms secure against power analysis attacks, we start with register file, where the operands and results of most instructions are stored. In this paper, we propose RFRF, a register file that stores data with a redundant flipped copy. With the redundant copy and a new precharge phase in write operations, RFRF provides data-independent power consumption on read and write for cryptographic algorithms. Although RFRF has large energy overhead, it is only enabled in the security mode. We validate our method with simulations. The results show that the power consumption of RFRF is independent of the values read out from or written to registers. Thus RFRF can help mitigate power analysis attacks.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129366748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-01DOI: 10.1109/ICCD.2008.4751879
Jeong Han, I. Park
In this paper, a new FIR digital filter synthesis algorithm is proposed to consider multiple adder graphs for a coefficient. The proposed algorithm selects an adder graph that can be maximally sharable with the remaining coefficients, while previous dependence-graph algorithms consider only one adder graph when implementing a coefficient. In addition, we propose an addition reordering technique to reduce the computational overhead of finding multiple adder graphs. By using the proposed technique, multiple adder graphs are efficiently generated from a seed adder graph obtained by using previous dependence-graph algorithms. Experimental results show that the proposed algorithm reduces the hardware cost of FIR filters by 23% and 3.4% on average compared to the Hartely and RAGn-hybrid algorithms.
{"title":"Digital filter synthesis considering multiple adder graphs for a coefficient","authors":"Jeong Han, I. Park","doi":"10.1109/ICCD.2008.4751879","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751879","url":null,"abstract":"In this paper, a new FIR digital filter synthesis algorithm is proposed to consider multiple adder graphs for a coefficient. The proposed algorithm selects an adder graph that can be maximally sharable with the remaining coefficients, while previous dependence-graph algorithms consider only one adder graph when implementing a coefficient. In addition, we propose an addition reordering technique to reduce the computational overhead of finding multiple adder graphs. By using the proposed technique, multiple adder graphs are efficiently generated from a seed adder graph obtained by using previous dependence-graph algorithms. Experimental results show that the proposed algorithm reduces the hardware cost of FIR filters by 23% and 3.4% on average compared to the Hartely and RAGn-hybrid algorithms.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124167815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-01DOI: 10.1109/ICCD.2008.4751856
Chen-Ling Chou, R. Marculescu
In this paper, we analyze the impact of network contention on the application mapping for tile-based network-on-chip (NoC) architectures. Our main theoretical contribution consists of an integer linear programming (ILP) formulation of the contention-aware application mapping problem which aims at minimizing the inter-tile network contention. To solve the scalability problem caused by ILP formulation, we propose a linear programming (LP) approach followed by an mapping heuristic. Taken together, they provide near-optimal solutions while reducing the runtime significantly. Experimental results show that, compared to other existing mapping approaches based on communication energy minimization, our contention-aware mapping technique achieves a significant decrease in packet latency (and implicitly, a throughput increase) with a negligible communication energy overhead.
{"title":"Contention-aware application mapping for Network-on-Chip communication architectures","authors":"Chen-Ling Chou, R. Marculescu","doi":"10.1109/ICCD.2008.4751856","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751856","url":null,"abstract":"In this paper, we analyze the impact of network contention on the application mapping for tile-based network-on-chip (NoC) architectures. Our main theoretical contribution consists of an integer linear programming (ILP) formulation of the contention-aware application mapping problem which aims at minimizing the inter-tile network contention. To solve the scalability problem caused by ILP formulation, we propose a linear programming (LP) approach followed by an mapping heuristic. Taken together, they provide near-optimal solutions while reducing the runtime significantly. Experimental results show that, compared to other existing mapping approaches based on communication energy minimization, our contention-aware mapping technique achieves a significant decrease in packet latency (and implicitly, a throughput increase) with a negligible communication energy overhead.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121068527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-01DOI: 10.1109/ICCD.2008.4751911
Jiangli Zhu, Xinmiao Zhang, Zhongfeng Wang
Reed-Solomon (RS) codes are one of the most extensively used error control codes in digital communication and storage systems. Recently, significant advancements have been made on algebraic soft-decision decoding (ASD) of RS codes. These algorithms can achieve substantial coding gain with polynomial complexity. One major step of ASD is the interpolation. Various techniques have been proposed to reduce the complexity of this step. Further speedup of this step is limited by the inherent serial nature of the interpolation algorithm. In this paper, taking the bit-level generalized minimum distance (BGMD) ASD as an example, we propose a novel technique to combine the computations from multiple interpolation iterations. Compared to the single interpolation iteration architecture for a (255, 239) RS code, the combined architecture can achieve 2.7 times throughput with only 2% area overhead in high signal-to-noise ratio scenarios.
{"title":"Combined interpolation architecture for soft-decision decoding of Reed-Solomon codes","authors":"Jiangli Zhu, Xinmiao Zhang, Zhongfeng Wang","doi":"10.1109/ICCD.2008.4751911","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751911","url":null,"abstract":"Reed-Solomon (RS) codes are one of the most extensively used error control codes in digital communication and storage systems. Recently, significant advancements have been made on algebraic soft-decision decoding (ASD) of RS codes. These algorithms can achieve substantial coding gain with polynomial complexity. One major step of ASD is the interpolation. Various techniques have been proposed to reduce the complexity of this step. Further speedup of this step is limited by the inherent serial nature of the interpolation algorithm. In this paper, taking the bit-level generalized minimum distance (BGMD) ASD as an example, we propose a novel technique to combine the computations from multiple interpolation iterations. Compared to the single interpolation iteration architecture for a (255, 239) RS code, the combined architecture can achieve 2.7 times throughput with only 2% area overhead in high signal-to-noise ratio scenarios.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128625948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-01DOI: 10.1109/ICCD.2008.4751922
P. Chaparro, J. Abella, J. Carretero, X. Vera
Multi-core microprocessors require reducing the FIT (failures-in-time) rate per core drastically to enable a larger number of cores within a FIT budget. Since large arrays like caches and register flies are typically protected with either ECC or parity, the issue system becomes as one of the largest contributors to the core's FIT rate. Soft-errors are an important concern in contemporary microprocessors. Particle hits on the components of a processor are expected to create an increasing number of transient errors in each new microprocessor generation. In addition, the number of hard-errors in the field is expected to grow as burn-in becomes less effective. Moreover, the continuous device shrinking increases the likelihood of in-the-field failures due to rather small defects exacerbated by degradation. This paper proposes on-line mechanisms to detect and recover to a consistent state, classify and confine in-the-field errors in the issue system of both in-order and out-of-order cores. Such mechanisms provide high coverage at a small cost.
{"title":"Issue system protection mechanisms","authors":"P. Chaparro, J. Abella, J. Carretero, X. Vera","doi":"10.1109/ICCD.2008.4751922","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751922","url":null,"abstract":"Multi-core microprocessors require reducing the FIT (failures-in-time) rate per core drastically to enable a larger number of cores within a FIT budget. Since large arrays like caches and register flies are typically protected with either ECC or parity, the issue system becomes as one of the largest contributors to the core's FIT rate. Soft-errors are an important concern in contemporary microprocessors. Particle hits on the components of a processor are expected to create an increasing number of transient errors in each new microprocessor generation. In addition, the number of hard-errors in the field is expected to grow as burn-in becomes less effective. Moreover, the continuous device shrinking increases the likelihood of in-the-field failures due to rather small defects exacerbated by degradation. This paper proposes on-line mechanisms to detect and recover to a consistent state, classify and confine in-the-field errors in the issue system of both in-order and out-of-order cores. Such mechanisms provide high coverage at a small cost.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127383191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-01DOI: 10.1109/ICCD.2008.4751897
Yufu Zhang, Ankur Srivastava, M. Zahran
This paper addresses the problem of chip level thermal profile estimation using runtime temperature sensor readings. We address the challenges of a) availability of only a few thermal sensors with constrained locations (sensors cannot be placed just anywhere) b) random on-chip power density characteristics due to unpredictable workloads and fabrication variability. Firstly we model the random power density as a probability density function. Given this random characteristic and runtime thermal sensor readings, we exploit the correlation between power dissipation of different chip modules to estimate the expected value of temperature at each chip location. Our methods are optimal if the underlying power density has Gaussian nature. We also present a heuristic to generate the chip level thermal profile estimates when the underlying randomness is non-Gaussian. Experimental results indicate that our method generates highly accurate thermal profile estimates of the entire chip at runtime using only a few thermal sensors.
{"title":"Chip level thermal profile estimation using on-chip temperature sensors","authors":"Yufu Zhang, Ankur Srivastava, M. Zahran","doi":"10.1109/ICCD.2008.4751897","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751897","url":null,"abstract":"This paper addresses the problem of chip level thermal profile estimation using runtime temperature sensor readings. We address the challenges of a) availability of only a few thermal sensors with constrained locations (sensors cannot be placed just anywhere) b) random on-chip power density characteristics due to unpredictable workloads and fabrication variability. Firstly we model the random power density as a probability density function. Given this random characteristic and runtime thermal sensor readings, we exploit the correlation between power dissipation of different chip modules to estimate the expected value of temperature at each chip location. Our methods are optimal if the underlying power density has Gaussian nature. We also present a heuristic to generate the chip level thermal profile estimates when the underlying randomness is non-Gaussian. Experimental results indicate that our method generates highly accurate thermal profile estimates of the entire chip at runtime using only a few thermal sensors.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127513075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-01DOI: 10.1109/ICCD.2008.4751865
J. Cha, S. Gupta
Memories are significant proportions of most digital systems and memory-intensive chips continue to lead the migration to new nano-fabrication processes. As these processes have increasingly higher defect rates, especially when they are first adopted, such early migration necessitates the use of increasing levels of redundancy to obtain high yield (per area). We show that as we move into nanometer processes with high defect rates, the level of redundancy needed to optimize yield-per-area is sufficiently high so as to significantly influence design tradeoffs. We then report a first step towards considering the overheads of redundancy during design optimization by characterizing the tradeoffs between the granularity of a design and the level of redundancy that optimizes the yield-per-area of static RAMs (SRAMs). Starting with physical layouts of cells and the desired memory size, we derive probabilities of failure at a range of abstractions - transistor level, cell level, and system level. We then estimate optimal memory granularity, i.e., the size of memory blocks, as well as the optimal number of spare rows and columns that maximize yield-per-area. In particular, we demonstrate the non-monotonic nature of these tradeoffs and present efficient designs for large SRAMs. Our ongoing research is characterizing several other specific tradeoffs, for SRAMs as well as logic blocks.
{"title":"Characterization of granularity and redundancy for SRAMs for optimal yield-per-area","authors":"J. Cha, S. Gupta","doi":"10.1109/ICCD.2008.4751865","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751865","url":null,"abstract":"Memories are significant proportions of most digital systems and memory-intensive chips continue to lead the migration to new nano-fabrication processes. As these processes have increasingly higher defect rates, especially when they are first adopted, such early migration necessitates the use of increasing levels of redundancy to obtain high yield (per area). We show that as we move into nanometer processes with high defect rates, the level of redundancy needed to optimize yield-per-area is sufficiently high so as to significantly influence design tradeoffs. We then report a first step towards considering the overheads of redundancy during design optimization by characterizing the tradeoffs between the granularity of a design and the level of redundancy that optimizes the yield-per-area of static RAMs (SRAMs). Starting with physical layouts of cells and the desired memory size, we derive probabilities of failure at a range of abstractions - transistor level, cell level, and system level. We then estimate optimal memory granularity, i.e., the size of memory blocks, as well as the optimal number of spare rows and columns that maximize yield-per-area. In particular, we demonstrate the non-monotonic nature of these tradeoffs and present efficient designs for large SRAMs. Our ongoing research is characterizing several other specific tradeoffs, for SRAMs as well as logic blocks.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121927630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-01DOI: 10.1109/ICCD.2008.4751846
J. L. Sánchez, H. Mora, J. M. Pascual, A. Jimeno-Morenilla
Since radix-10 arithmetic has been gaining renewed importance over the last few years, high performance decimal systems and techniques are highly demanded. In this paper, a modification of the CORDIC method for decimal arithmetic is proposed so as to improve calculations. The algorithm works with BCD operands and no conversion to binary is needed. A significant reduction in the number of iterations in comparison to the original decimal CORDIC method is achieved. The experiments showing the advantages of the new method are described. Also, the results with regard to delay obtained by means of an FPGA implementation of the method are shown.
{"title":"Architecture implementation of an improved decimal CORDIC method","authors":"J. L. Sánchez, H. Mora, J. M. Pascual, A. Jimeno-Morenilla","doi":"10.1109/ICCD.2008.4751846","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751846","url":null,"abstract":"Since radix-10 arithmetic has been gaining renewed importance over the last few years, high performance decimal systems and techniques are highly demanded. In this paper, a modification of the CORDIC method for decimal arithmetic is proposed so as to improve calculations. The algorithm works with BCD operands and no conversion to binary is needed. A significant reduction in the number of iterations in comparison to the original decimal CORDIC method is achieved. The experiments showing the advantages of the new method are described. Also, the results with regard to delay obtained by means of an FPGA implementation of the method are shown.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121878167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}