H. D. Groot, M. Ashouei, J. Penders, V. Pop, M. Vidojkovic, B. Gyselinckx, R. Yazicioglu
The cost of health care in first-world countries is increasing dramatically as a result of advances in medicine, a population that is becoming older and an increasingly unhealthy lifestyle. Personal health care concepts where sensors within and around the body monitor and measure all kind of physiological signals can be an addition to medicare with high benefits. This concept allows patients to stay in their home environment and hence have a better quality of life with lower costs involved. For these reasons research and development is ongoing on many body worn and implantable sensor nodes. In this paper it is shown that application knowledge and understanding the contribution of different components to the system power consumption is the best starting point to make optimal trade-offs in the system design. This will minimize the overall power consumption of a sensor node without losing track of the major functionality needed. Besides the importance of system optimization, it is also shown that new components and circuit techniques need to be developed to achieve orders of magnitude increase in energy efficiency. This is a must to realize ultra-thin electrocardiogram patches as well as more demanding nodes with a small form factor like real-time Electro Encephalogram processing for brain computer interaction or neuro-implants.
{"title":"Human++: Key Challenges and Trade-offs in Embedded System Design for Personal Health Care","authors":"H. D. Groot, M. Ashouei, J. Penders, V. Pop, M. Vidojkovic, B. Gyselinckx, R. Yazicioglu","doi":"10.1109/DSD.2011.115","DOIUrl":"https://doi.org/10.1109/DSD.2011.115","url":null,"abstract":"The cost of health care in first-world countries is increasing dramatically as a result of advances in medicine, a population that is becoming older and an increasingly unhealthy lifestyle. Personal health care concepts where sensors within and around the body monitor and measure all kind of physiological signals can be an addition to medicare with high benefits. This concept allows patients to stay in their home environment and hence have a better quality of life with lower costs involved. For these reasons research and development is ongoing on many body worn and implantable sensor nodes. In this paper it is shown that application knowledge and understanding the contribution of different components to the system power consumption is the best starting point to make optimal trade-offs in the system design. This will minimize the overall power consumption of a sensor node without losing track of the major functionality needed. Besides the importance of system optimization, it is also shown that new components and circuit techniques need to be developed to achieve orders of magnitude increase in energy efficiency. This is a must to realize ultra-thin electrocardiogram patches as well as more demanding nodes with a small form factor like real-time Electro Encephalogram processing for brain computer interaction or neuro-implants.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125146000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Algredo-Badillo, C. F. Uribe, R. Cumplido, M. Morales-Sandoval
Cryptographic algorithms are used to enable security services that are the core of modern communication systems. In particular, Hash functions algorithms are widely used to provide services of data integrity and authentication. These algorithms are based on performing a number of complex operations on the input data, thus it is important to count with novel designs that can be efficiently mapped to hardware architectures. Hash functions perform internal operations in an iterative fashion, which open the possibility of exploring several implementation strategies. In the paper, two different schemes to improve the performance of the hardware implementation of the SHA-2 family of algorithms are proposed. The main focus of the proposed schemes is to reduce the critical path by reordering the operations required at each iteration of the algorithm. Implementation results on an FPGA device show an improvement on the performance on the SHA-256 algorithm when compared against similar previously proposed approaches.
{"title":"Novel Hardware Architecture for Implementing the Inner Loop of the SHA-2 Algorithms","authors":"I. Algredo-Badillo, C. F. Uribe, R. Cumplido, M. Morales-Sandoval","doi":"10.1109/DSD.2011.75","DOIUrl":"https://doi.org/10.1109/DSD.2011.75","url":null,"abstract":"Cryptographic algorithms are used to enable security services that are the core of modern communication systems. In particular, Hash functions algorithms are widely used to provide services of data integrity and authentication. These algorithms are based on performing a number of complex operations on the input data, thus it is important to count with novel designs that can be efficiently mapped to hardware architectures. Hash functions perform internal operations in an iterative fashion, which open the possibility of exploring several implementation strategies. In the paper, two different schemes to improve the performance of the hardware implementation of the SHA-2 family of algorithms are proposed. The main focus of the proposed schemes is to reduce the critical path by reordering the operations required at each iteration of the algorithm. Implementation results on an FPGA device show an improvement on the performance on the SHA-256 algorithm when compared against similar previously proposed approaches.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116513311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xu Guo, Meeta Srivastav, Sinan Huang, D. Ganta, Michael B. Henry, L. Nazhandali, P. Schaumont
The NIST SHA-3 competition aims to select a new secure hash standard. Hardware implementation quality is an important factor in evaluating the SHA-3 finalists. However, a comprehensive methodology to benchmark five final round SHA-3 candidates in ASIC is challenging. Many factors need to be considered, including application scenarios, target technologies and optimization goals. This work describes detailed steps in the silicon implementation of a SHA-3 ASIC. The plan of ASIC prototyping with all the SHA-3 finalists, as an integral part of our SHA-3 ASIC evaluation project, is motivated by our previously proposed methodology, which defines a consistent and systematic approach to move a SHA-3 hardware benchmark process from FPGA prototyping to ASIC implementation. We have designed the remaining five SHA-3 candidates in 0.13 $mu m$ IBM process using standard-cell CMOS technology. In this paper, we discuss our proposed methodology for SHA-3 ASIC evaluation and report the latest results based on post-layout simulation of the five SHA-3 finalists with Round 3 tweaks.
{"title":"Pre-silicon Characterization of NIST SHA-3 Final Round Candidates","authors":"Xu Guo, Meeta Srivastav, Sinan Huang, D. Ganta, Michael B. Henry, L. Nazhandali, P. Schaumont","doi":"10.1109/DSD.2011.74","DOIUrl":"https://doi.org/10.1109/DSD.2011.74","url":null,"abstract":"The NIST SHA-3 competition aims to select a new secure hash standard. Hardware implementation quality is an important factor in evaluating the SHA-3 finalists. However, a comprehensive methodology to benchmark five final round SHA-3 candidates in ASIC is challenging. Many factors need to be considered, including application scenarios, target technologies and optimization goals. This work describes detailed steps in the silicon implementation of a SHA-3 ASIC. The plan of ASIC prototyping with all the SHA-3 finalists, as an integral part of our SHA-3 ASIC evaluation project, is motivated by our previously proposed methodology, which defines a consistent and systematic approach to move a SHA-3 hardware benchmark process from FPGA prototyping to ASIC implementation. We have designed the remaining five SHA-3 candidates in 0.13 $mu m$ IBM process using standard-cell CMOS technology. In this paper, we discuss our proposed methodology for SHA-3 ASIC evaluation and report the latest results based on post-layout simulation of the five SHA-3 finalists with Round 3 tweaks.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"21 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132935184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kameswar Rao Vaddina, A. Rahmani, Khalid Latif, P. Liljeberg, J. Plosila
Three-dimensional technology offers greater device integration, reduced signal delay and reduced interconnect power. It also provides greater design flexibility by allowing heterogeneous integration. However, 3D technology exacerbates the on-chip thermal issues and increases packaging and cooling costs. In this work, a 3D thermal model of a stacked network-on-chip system is developed and thermal analysis is performed in order to analyze different job allocation and scheduling schemes using finite element simulations. The steady-state heat transfer analysis on the 3D stacked structure has been performed. We have analyzed the effect of variation of die power consumption, with and without hotspots, on peak temperatures in different layers of the stack. The optimal die placement solution is also provided based on the maximum temperature attained by the individual silicon dies.
{"title":"Thermal Analysis of Job Allocation and Scheduling Schemes for 3D Stacked NoC's","authors":"Kameswar Rao Vaddina, A. Rahmani, Khalid Latif, P. Liljeberg, J. Plosila","doi":"10.1109/DSD.2011.87","DOIUrl":"https://doi.org/10.1109/DSD.2011.87","url":null,"abstract":"Three-dimensional technology offers greater device integration, reduced signal delay and reduced interconnect power. It also provides greater design flexibility by allowing heterogeneous integration. However, 3D technology exacerbates the on-chip thermal issues and increases packaging and cooling costs. In this work, a 3D thermal model of a stacked network-on-chip system is developed and thermal analysis is performed in order to analyze different job allocation and scheduling schemes using finite element simulations. The steady-state heat transfer analysis on the 3D stacked structure has been performed. We have analyzed the effect of variation of die power consumption, with and without hotspots, on peak temperatures in different layers of the stack. The optimal die placement solution is also provided based on the maximum temperature attained by the individual silicon dies.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133411299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Khalid Latif, A. Rahmani, Kameswar Rao Vaddina, T. Seceleanu, P. Liljeberg, H. Tenhunen
Reliability of embedded systems and devices is becoming a challenge with technology scaling. To deal with the reliability issues, fault tolerant solutions are needed. The design paradigm for future System-on-Chip (SoC) implementation is Network-on-Chip (NoC). Fault tolerance in NoC can be achieved at many abstraction levels. Many fault tolerant architectures and routing algorithms have already been proposed for NoC but the utilization of resources, affected indirectly by faults is yet to be addressed. In this paper, we propose a NoC architecture, which sustains the overall system performance by utilizing resources, which cannot be used by other architectures under faults. An approach towards a proper virtual-channel (VC) sharing strategy is proposed, based on communication bandwidth requirements. The technique can be applied to any NoC architecture, including 3-D NoCs. Extensive quantitative experiments with synthetic benchmarks, including uniform, transpose and negative exponential distribution (NED), demonstrate considerable improvement in terms of performance sustainability under faulty conditions compared to existing VC-based NoC architectures.
{"title":"Enhancing Performance Sustainability of Fault Tolerant Routing Algorithms in NoC-Based Architectures","authors":"Khalid Latif, A. Rahmani, Kameswar Rao Vaddina, T. Seceleanu, P. Liljeberg, H. Tenhunen","doi":"10.1109/DSD.2011.85","DOIUrl":"https://doi.org/10.1109/DSD.2011.85","url":null,"abstract":"Reliability of embedded systems and devices is becoming a challenge with technology scaling. To deal with the reliability issues, fault tolerant solutions are needed. The design paradigm for future System-on-Chip (SoC) implementation is Network-on-Chip (NoC). Fault tolerance in NoC can be achieved at many abstraction levels. Many fault tolerant architectures and routing algorithms have already been proposed for NoC but the utilization of resources, affected indirectly by faults is yet to be addressed. In this paper, we propose a NoC architecture, which sustains the overall system performance by utilizing resources, which cannot be used by other architectures under faults. An approach towards a proper virtual-channel (VC) sharing strategy is proposed, based on communication bandwidth requirements. The technique can be applied to any NoC architecture, including 3-D NoCs. Extensive quantitative experiments with synthetic benchmarks, including uniform, transpose and negative exponential distribution (NED), demonstrate considerable improvement in terms of performance sustainability under faulty conditions compared to existing VC-based NoC architectures.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133901129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Radio-frequency identification (RFID) technology has emerged from a simple identification technique towards the enabler for more sophisticated applications like advanced data storage and proof of origin. Even so-called passive RFID tags that receive their power from a radio-frequency field tend to implement more functionality which leads to increased control complexity on them. Current low-cost tags are implemented by hardwired state machines which are inflexible and also inefficient when control complexity increases. In this work, we present a flexible tag platform that is based on a simple 8-bit microcontroller optimized for low chip area and low power consumption. We demonstrate the efficiency of our approach by implementing a near-field communication (NFC) compatible tag with advanced file-access functionality and security features in hardware. Results show that the power consumption of the microcontroller is below 10, textmu A at 106, kHz and that the control part of the flexible tag platform requires a chip area of 10, kGEs making it suitable for low-cost RFID devices.
{"title":"Hardware Implementation of a Flexible Tag Platform for Passive RFID Devices","authors":"Thomas Plos, Martin Feldhofer","doi":"10.1109/DSD.2011.43","DOIUrl":"https://doi.org/10.1109/DSD.2011.43","url":null,"abstract":"Radio-frequency identification (RFID) technology has emerged from a simple identification technique towards the enabler for more sophisticated applications like advanced data storage and proof of origin. Even so-called passive RFID tags that receive their power from a radio-frequency field tend to implement more functionality which leads to increased control complexity on them. Current low-cost tags are implemented by hardwired state machines which are inflexible and also inefficient when control complexity increases. In this work, we present a flexible tag platform that is based on a simple 8-bit microcontroller optimized for low chip area and low power consumption. We demonstrate the efficiency of our approach by implementing a near-field communication (NFC) compatible tag with advanced file-access functionality and security features in hardware. Results show that the power consumption of the microcontroller is below 10, textmu A at 106, kHz and that the control part of the flexible tag platform requires a chip area of 10, kGEs making it suitable for low-cost RFID devices.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116098192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The very hight levels of integration and submicron device sizes used in current and emerging VLSI technologies for SRAM-based FPGAs lead to higher occurrences of defects and operational faults. Thus, there is a critical need for fault tolerance and reconfiguration techniques for SRAM-based FPGAs to increase chip reliability with field reconfiguration. We first propose a technique utilizing the principle of master slave to tolerate logic or cells in SRAM-based FPGAs. We show that this architectural technique can be used to build redundancy for defect and fault tolerance with limited area and performance overhead. Our algorithm improves reliability of the SRAM-based FPGAs by performing two operations: TMR (triple modular redundancy) (in which CLBs are used to triplicate a logic function whose value is obtained at the voter output) and partitioning (in which the design is partitioned into a set of MSUs (master-slave unit) to reduce the amount of configuration memory required). In response to a component failure, a functionality equivalent MSU that does not rely on the faulty component replaces the affected MSU. Our technique can handle a large numbers of faults (we show tolerance of 16 logic faults in look-up tables LUTs belonging to the same MSU). Experimental results conducted on a subset of the ITC'99 benchmarks demonstrate a high level of reliability in term of fault tolerance with low hardware overhead compared to TMR which has a 5x- 6x area overhead and high power consumption.
{"title":"Fault Tolerance of Multiple Logic Faults in SRAM-Based FPGA Systems","authors":"Farid Lahrach, A. Doumar, E. Châtelet","doi":"10.1109/DSD.2011.33","DOIUrl":"https://doi.org/10.1109/DSD.2011.33","url":null,"abstract":"The very hight levels of integration and submicron device sizes used in current and emerging VLSI technologies for SRAM-based FPGAs lead to higher occurrences of defects and operational faults. Thus, there is a critical need for fault tolerance and reconfiguration techniques for SRAM-based FPGAs to increase chip reliability with field reconfiguration. We first propose a technique utilizing the principle of master slave to tolerate logic or cells in SRAM-based FPGAs. We show that this architectural technique can be used to build redundancy for defect and fault tolerance with limited area and performance overhead. Our algorithm improves reliability of the SRAM-based FPGAs by performing two operations: TMR (triple modular redundancy) (in which CLBs are used to triplicate a logic function whose value is obtained at the voter output) and partitioning (in which the design is partitioned into a set of MSUs (master-slave unit) to reduce the amount of configuration memory required). In response to a component failure, a functionality equivalent MSU that does not rely on the faulty component replaces the affected MSU. Our technique can handle a large numbers of faults (we show tolerance of 16 logic faults in look-up tables LUTs belonging to the same MSU). Experimental results conducted on a subset of the ITC'99 benchmarks demonstrate a high level of reliability in term of fault tolerance with low hardware overhead compared to TMR which has a 5x- 6x area overhead and high power consumption.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123460952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bo Liu, H. Pourshaghaghi, Sebastian M. Londono, J. P. D. Gyvez
Sub-threshold circuit design has become a popular approach for building energy efficient digital circuits. The main drawbacks are performance degradation due to the exponentially reduced driving current, and the effect of increased sensitivity to process variation. To obtain energy savings while reducing performance degradation, we propose the design of a robust sub-threshold library and post-silicon tuning using an adaptive fuzzy logic controller which performs body bias scaling. We show that our methodology is able to fix the performance, consequently, making the system more energy efficient and achieving maximum yield.
{"title":"Process Variation Reduction for CMOS Logic Operating at Sub-threshold Supply Voltage","authors":"Bo Liu, H. Pourshaghaghi, Sebastian M. Londono, J. P. D. Gyvez","doi":"10.1109/DSD.2011.21","DOIUrl":"https://doi.org/10.1109/DSD.2011.21","url":null,"abstract":"Sub-threshold circuit design has become a popular approach for building energy efficient digital circuits. The main drawbacks are performance degradation due to the exponentially reduced driving current, and the effect of increased sensitivity to process variation. To obtain energy savings while reducing performance degradation, we propose the design of a robust sub-threshold library and post-silicon tuning using an adaptive fuzzy logic controller which performs body bias scaling. We show that our methodology is able to fix the performance, consequently, making the system more energy efficient and achieving maximum yield.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128727257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bilinear maps, or pairings, on elliptic curves are an active area of research in modern cryptology with applications ranging from cryptanalysis (e.g. MOV attack) over identity-based encryption to short signature schemes. Many parameterisations and implementation options for pairing-based cryptography have been investigated in the recent past. Elliptic curves over prime fields are often preferred for software implementation, whereas extension fields of characteristic two and three offer advantages for implementation in hardware. In the ideal case, a hardware accelerator for pairing-based cryptography can support all three types of field to ensure inter-operability with a broad spectrum of applications. This need has motivated the design of so-called unified multipliers, which are basically multipliers that integrate different types of operands (e.g. integers and polynomials) into a single data path. In the present paper, we introduce a unified multiply/accumulate unit for signed/unsigned integers as well as binary and ternary polynomials. The multiplier generates partial products using a Redundant Signed-Digit (RSD) representation that allows for efficient combination of all three operand types into one data path. In addition, our design takes advantage of a high-radix encoding scheme for integers and binary polynomials to reduce the overall number of partial products and utilise the data path in an optimal way. We compare our multiplier with a previous radix-2 implementation of Ozturk et al and analyse the differences in terms of silicon area and critical path delay. The unified multiply/accumulate unit described in this paper can be used in embedded systems like smart cards, either as arithmetic core of a cryptographic co-processor, or as functional unit of an application-specific processor.
{"title":"A Unified Multiply/Accumulate Unit for Pairing-Based Cryptography over Prime, Binary and Ternary Fields","authors":"Tobias Vejda, J. Großschädl, D. Page","doi":"10.1109/DSD.2011.89","DOIUrl":"https://doi.org/10.1109/DSD.2011.89","url":null,"abstract":"Bilinear maps, or pairings, on elliptic curves are an active area of research in modern cryptology with applications ranging from cryptanalysis (e.g. MOV attack) over identity-based encryption to short signature schemes. Many parameterisations and implementation options for pairing-based cryptography have been investigated in the recent past. Elliptic curves over prime fields are often preferred for software implementation, whereas extension fields of characteristic two and three offer advantages for implementation in hardware. In the ideal case, a hardware accelerator for pairing-based cryptography can support all three types of field to ensure inter-operability with a broad spectrum of applications. This need has motivated the design of so-called unified multipliers, which are basically multipliers that integrate different types of operands (e.g. integers and polynomials) into a single data path. In the present paper, we introduce a unified multiply/accumulate unit for signed/unsigned integers as well as binary and ternary polynomials. The multiplier generates partial products using a Redundant Signed-Digit (RSD) representation that allows for efficient combination of all three operand types into one data path. In addition, our design takes advantage of a high-radix encoding scheme for integers and binary polynomials to reduce the overall number of partial products and utilise the data path in an optimal way. We compare our multiplier with a previous radix-2 implementation of Ozturk et al and analyse the differences in terms of silicon area and critical path delay. The unified multiply/accumulate unit described in this paper can be used in embedded systems like smart cards, either as arithmetic core of a cryptographic co-processor, or as functional unit of an application-specific processor.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127443234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
HW-SW partitioning is a key problem in HW-SW code sign of embedded systems studied extensively in the past. All proposed approaches are top down flows, that start from a homogeneous formal specification of the system and determine an optimal partitioning. Thus, the proposed techniques do not exploit reuse nor reconsider the HWSW partitioning of an already designed platform. This paper proposes an extension of traditional flows that allows reuse and automatic generation of components and interfaces. The final flow has been applied to a complex industrial platform to prove the effectiveness and the advantages of the proposed approach.
{"title":"Automatic Interface Generation for Component Reuse in HW-SW Partitioning","authors":"N. Bombieri, F. Fummi, S. Vinco, D. Quaglia","doi":"10.1109/DSD.2011.105","DOIUrl":"https://doi.org/10.1109/DSD.2011.105","url":null,"abstract":"HW-SW partitioning is a key problem in HW-SW code sign of embedded systems studied extensively in the past. All proposed approaches are top down flows, that start from a homogeneous formal specification of the system and determine an optimal partitioning. Thus, the proposed techniques do not exploit reuse nor reconsider the HWSW partitioning of an already designed platform. This paper proposes an extension of traditional flows that allows reuse and automatic generation of components and interfaces. The final flow has been applied to a complex industrial platform to prove the effectiveness and the advantages of the proposed approach.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130757038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}