A common approach to protect confidential information is to use a stream cipher which combines plain text bits with apseudo-random bit sequence. Among the existing stream ciphers, Non-Linear Feedback Shift Register (NLFSR)-based ones provide the best trade-off between cryptographic security and hardware efficiency. In this paper, we show how to further improve the hardware efficiency of the Grain stream cipher. By transforming the NLFSR of Grain from its original Fibonacci configuration to the Galois configuration and by introducing new hardware solutions, we double the throughput of the 80 and 128-bit key 1 bit/cycle architectures of Grain with no area and power penalty.
{"title":"An Improved Hardware Implementation of the Grain Stream Cipher","authors":"S. Mansouri, E. Dubrova","doi":"10.1109/DSD.2010.49","DOIUrl":"https://doi.org/10.1109/DSD.2010.49","url":null,"abstract":"A common approach to protect confidential information is to use a stream cipher which combines plain text bits with apseudo-random bit sequence. Among the existing stream ciphers, Non-Linear Feedback Shift Register (NLFSR)-based ones provide the best trade-off between cryptographic security and hardware efficiency. In this paper, we show how to further improve the hardware efficiency of the Grain stream cipher. By transforming the NLFSR of Grain from its original Fibonacci configuration to the Galois configuration and by introducing new hardware solutions, we double the throughput of the 80 and 128-bit key 1 bit/cycle architectures of Grain with no area and power penalty.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117290743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ye Gao, Ryusuke Egawa, H. Takizawa, Hiroaki Kobayashi
Nowadays, multimedia applications (MMAs) form an important workload for general purpose processors. Although the vector architecture is considered the most potential candidate for media processing, the traditional vector architecture has inefficiencies to execute MMAs. This paper proposes a media-oriented vector architecture, which improves the traditional one with a load-forwarding mechanism. The load-forwarding mechanism overcomes the inefficiency on utilization of the memory bandwidth. As a result, the proposed architecture achieves a higher performance with lower hardware cost than the traditional one. This paper evaluates the proposed architecture with architectural design parameters and finds out the most efficient size for the vector architecture when performing MMAs.
{"title":"A Load-Forwarding Mechanism for the Vector Architecture in Multimedia Applications","authors":"Ye Gao, Ryusuke Egawa, H. Takizawa, Hiroaki Kobayashi","doi":"10.1109/DSD.2010.93","DOIUrl":"https://doi.org/10.1109/DSD.2010.93","url":null,"abstract":"Nowadays, multimedia applications (MMAs) form an important workload for general purpose processors. Although the vector architecture is considered the most potential candidate for media processing, the traditional vector architecture has inefficiencies to execute MMAs. This paper proposes a media-oriented vector architecture, which improves the traditional one with a load-forwarding mechanism. The load-forwarding mechanism overcomes the inefficiency on utilization of the memory bandwidth. As a result, the proposed architecture achieves a higher performance with lower hardware cost than the traditional one. This paper evaluates the proposed architecture with architectural design parameters and finds out the most efficient size for the vector architecture when performing MMAs.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132383652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a novel all-optical router as a building block for a scalable wavelength-switched optical NoC. The proposed optical router, named as AOR, performs passive routing of optical data streams based on their wavelengths. Utilizing wavelength routing method, AOR eliminates the need for electrical resource reservation and the corresponding latency and area overheads. Taking advantage of Wavelength Division Multiplexing (WDM) technique, the proposed architecture is capable of data multicasting, concurrent with unicast data transmission, with high bandwidth and low power dissipation, without imposing noticeable area and latency overheads. Comparing AOR against previously proposed optical routers, we deduce that the proposed router architecture reduces optical insertion loss, electrical power consumption, and number of micro rings, and also improves scalability of the on-chip network.
{"title":"Scalable Architecture for Wavelength-Switched Optical NoC with Multicasting Capability","authors":"S. Koohi, A. Shafaei, S. Hessabi","doi":"10.1109/DSD.2010.11","DOIUrl":"https://doi.org/10.1109/DSD.2010.11","url":null,"abstract":"This paper proposes a novel all-optical router as a building block for a scalable wavelength-switched optical NoC. The proposed optical router, named as AOR, performs passive routing of optical data streams based on their wavelengths. Utilizing wavelength routing method, AOR eliminates the need for electrical resource reservation and the corresponding latency and area overheads. Taking advantage of Wavelength Division Multiplexing (WDM) technique, the proposed architecture is capable of data multicasting, concurrent with unicast data transmission, with high bandwidth and low power dissipation, without imposing noticeable area and latency overheads. Comparing AOR against previously proposed optical routers, we deduce that the proposed router architecture reduces optical insertion loss, electrical power consumption, and number of micro rings, and also improves scalability of the on-chip network.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132332286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Modern embedded systems come with contradictory design constraints. On one hand, these systems often target mass production and battery-based devices, and therefore should be cheap and power efficient. On the other hand, they need to achieve high (real-time) performance. This wide spectrum of design requirements leads to complex heterogeneous system-on-chip (SoC) architectures. The complexity of embedded systems forces designers to model and simulate systems and their components to explore the wide range of design choices. Such design space exploration is especially needed during the early design stages, where the design space is at its largest. Due to the exponential design space in real problems and multiple criteria to be considered, multi-objective evolutionary algorithms (MOEAs) are often used to trim down a large design space into a finite set of points and provide the designer a set of tradable solutions with respect to the design criteria. Interpreting the search results (e.g., where are the Pareto points located), understanding their relations and analyzing how the design space was searched by such searching algorithms is of invaluable importance to the designer. To this end, this paper presents a novel interactive visualization tool, based on tree visualization, to understand the search dynamics of a MOEA and to visualize where the optimum design points are located in the design space and what objective values they have.
{"title":"Visualization of Multi-objective Design Space Exploration for Embedded Systems","authors":"T. Taghavi, A. Pimentel","doi":"10.1109/DSD.2010.75","DOIUrl":"https://doi.org/10.1109/DSD.2010.75","url":null,"abstract":"Modern embedded systems come with contradictory design constraints. On one hand, these systems often target mass production and battery-based devices, and therefore should be cheap and power efficient. On the other hand, they need to achieve high (real-time) performance. This wide spectrum of design requirements leads to complex heterogeneous system-on-chip (SoC) architectures. The complexity of embedded systems forces designers to model and simulate systems and their components to explore the wide range of design choices. Such design space exploration is especially needed during the early design stages, where the design space is at its largest. Due to the exponential design space in real problems and multiple criteria to be considered, multi-objective evolutionary algorithms (MOEAs) are often used to trim down a large design space into a finite set of points and provide the designer a set of tradable solutions with respect to the design criteria. Interpreting the search results (e.g., where are the Pareto points located), understanding their relations and analyzing how the design space was searched by such searching algorithms is of invaluable importance to the designer. To this end, this paper presents a novel interactive visualization tool, based on tree visualization, to understand the search dynamics of a MOEA and to visualize where the optimum design points are located in the design space and what objective values they have.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130119871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Bode, Mladen Berekovic, A. Borkowski, Ludger Buker
In ASICs with structure sizes of 65nm and below the requirements of precise and robust clock networks continuously increase. High-speed circuits already use full-custom clock-meshes instead of buffer trees. Recently new clock-mesh synthesis tools with more automation have become available which better suit ASIC design flows. This paper provides a QoR analysis of these meshes versus highly optimized buffer trees with respect to timing and power. Furthermore, we analyzed the sensitivity of the topologies to OCV. For this purpose we realized a monte carlo analysis in SPICE as basis for STA. A design-dependent evaluation has been performed by applying the clock networks and analysis to six different designs. Independent of OCV, the clock-mesh reduces the global skew by up to 65% at the expense of a medial increase in average power consumption by 57% when compared to the buffer tree. Focussing on a further reduction of power dissipation, possible improvements of the automated clock-mesh implementation are proposed.
{"title":"QoR Analysis of Automated Clock-Mesh Implementation under OCV Consideration","authors":"D. Bode, Mladen Berekovic, A. Borkowski, Ludger Buker","doi":"10.1109/DSD.2010.60","DOIUrl":"https://doi.org/10.1109/DSD.2010.60","url":null,"abstract":"In ASICs with structure sizes of 65nm and below the requirements of precise and robust clock networks continuously increase. High-speed circuits already use full-custom clock-meshes instead of buffer trees. Recently new clock-mesh synthesis tools with more automation have become available which better suit ASIC design flows. This paper provides a QoR analysis of these meshes versus highly optimized buffer trees with respect to timing and power. Furthermore, we analyzed the sensitivity of the topologies to OCV. For this purpose we realized a monte carlo analysis in SPICE as basis for STA. A design-dependent evaluation has been performed by applying the clock networks and analysis to six different designs. Independent of OCV, the clock-mesh reduces the global skew by up to 65% at the expense of a medial increase in average power consumption by 57% when compared to the buffer tree. Focussing on a further reduction of power dissipation, possible improvements of the automated clock-mesh implementation are proposed.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"310 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133238897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multi-core technology can provide valuable benefits for improving safety critical embedded systems. Examples range from multiple core architectures, introducing system redundancy, asymmetric multiprocessing allowing high software diversity, to hyper visors reducing system complexity. Can these benefits be taken for granted without considering the drawbacks and effects that come with them? The move to multi-core based architectures is already underway. Sooner, rather than later, we are forced to discover and resolve its issues for safety related applications. This paper is an attempt to evaluate the value of multi-core for safety critical systems on a broader level.
{"title":"Multi-core Technology -- Next Evolution Step in Safety Critical Systems for Industrial Applications?","authors":"F. Reichenbach, Alexander Wold","doi":"10.1109/DSD.2010.50","DOIUrl":"https://doi.org/10.1109/DSD.2010.50","url":null,"abstract":"Multi-core technology can provide valuable benefits for improving safety critical embedded systems. Examples range from multiple core architectures, introducing system redundancy, asymmetric multiprocessing allowing high software diversity, to hyper visors reducing system complexity. Can these benefits be taken for granted without considering the drawbacks and effects that come with them? The move to multi-core based architectures is already underway. Sooner, rather than later, we are forced to discover and resolve its issues for safety related applications. This paper is an attempt to evaluate the value of multi-core for safety critical systems on a broader level.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130360823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes utilisation of polymorphic electronics to design digital circuit controllers that gracefully degrades when some inconvenient situation arise, e.g. when battery goes low or a chip temperature cross some safe level. In proposed approach, the next state logic of the controller is designed using polymorphic gates. Polymorphic gates exhibit two or more logic functions in according to a specific condition (e.g. Vdd level or special signals). This allows to make a smart reconfiguration of the circuit. An algorithm for designing gracefully degrading circuit controllers using polymorphic gates is proposed in the paper. Purpose of the algorithm is demonstrated on an example of a controller. This controller was physically realised and its functionality (especially in transient state) was verified.
{"title":"Gracefully Degrading Circuit Controllers Based on Polytronics","authors":"R. Ruzicka","doi":"10.1109/DSD.2010.92","DOIUrl":"https://doi.org/10.1109/DSD.2010.92","url":null,"abstract":"This paper proposes utilisation of polymorphic electronics to design digital circuit controllers that gracefully degrades when some inconvenient situation arise, e.g. when battery goes low or a chip temperature cross some safe level. In proposed approach, the next state logic of the controller is designed using polymorphic gates. Polymorphic gates exhibit two or more logic functions in according to a specific condition (e.g. Vdd level or special signals). This allows to make a smart reconfiguration of the circuit. An algorithm for designing gracefully degrading circuit controllers using polymorphic gates is proposed in the paper. Purpose of the algorithm is demonstrated on an example of a controller. This controller was physically realised and its functionality (especially in transient state) was verified.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115849620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Despite the huge potential, value predictors have not been used in modern processors. This is partially due to the complex structures associated with such predictors. In this paper we study value predictors and investigate solutions to reduce storage requirements while imposing negligible coverage cost. Our solutions build on the observation that conventional value predictors do not utilize storage efficiently as they allocate too much space for small and frequently appearing values. We measure data width requirement and entropy in a subset of predictor resources and show that values stored in predictors show limited sizes and very small entropy. We exploit this behavior and suggest different bit sharing solutions for predictors storing single byte values.
{"title":"Storage-Aware Value Prediction","authors":"M. Salehi, A. Baniasadi","doi":"10.1109/DSD.2010.70","DOIUrl":"https://doi.org/10.1109/DSD.2010.70","url":null,"abstract":"Despite the huge potential, value predictors have not been used in modern processors. This is partially due to the complex structures associated with such predictors. In this paper we study value predictors and investigate solutions to reduce storage requirements while imposing negligible coverage cost. Our solutions build on the observation that conventional value predictors do not utilize storage efficiently as they allocate too much space for small and frequently appearing values. We measure data width requirement and entropy in a subset of predictor resources and show that values stored in predictors show limited sizes and very small entropy. We exploit this behavior and suggest different bit sharing solutions for predictors storing single byte values.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121721063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Cornelius, Philipp Gorski, S. Kubisch, D. Timmermann
Several alternatives of mesh-type topologies have been published for the use in Networks-on-Chip. Due to their regularity, mesh-type topologies often serve as a foundation to investigate new ideas or to customize the topology to application-specific needs. This paper analyzes existing mesh-type topologies and compares their characteristics in terms of communication and implementation costs. Furthermore, this paper proposes BEAM (Border-Enhanced Mesh) − a mesh-type topology for Networks-on-Chip. BEAM uses concentration while necessitating only low-radix routers. Thereto, additional resources are connected to the outer boundaries of a conventional mesh. As a result, overall bandwidth is traded off against hardware overhead. In conclusion, simulation and synthesis results show that the conventional mesh stands out due to its communication performance, whereas clustered and concentrated topologies offer the least hardware overhead. BEAM ranges in between and is an option to balance hardware costs and communication performance.
{"title":"Trading Hardware Overhead for Communication Performance in Mesh-Type Topologies","authors":"C. Cornelius, Philipp Gorski, S. Kubisch, D. Timmermann","doi":"10.1109/DSD.2010.67","DOIUrl":"https://doi.org/10.1109/DSD.2010.67","url":null,"abstract":"Several alternatives of mesh-type topologies have been published for the use in Networks-on-Chip. Due to their regularity, mesh-type topologies often serve as a foundation to investigate new ideas or to customize the topology to application-specific needs. This paper analyzes existing mesh-type topologies and compares their characteristics in terms of communication and implementation costs. Furthermore, this paper proposes BEAM (Border-Enhanced Mesh) − a mesh-type topology for Networks-on-Chip. BEAM uses concentration while necessitating only low-radix routers. Thereto, additional resources are connected to the outer boundaries of a conventional mesh. As a result, overall bandwidth is traded off against hardware overhead. In conclusion, simulation and synthesis results show that the conventional mesh stands out due to its communication performance, whereas clustered and concentrated topologies offer the least hardware overhead. BEAM ranges in between and is an option to balance hardware costs and communication performance.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126169875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Delay faults in content addressable memories (CAMs) is a major concern in many applications such as network routers, IP filters, longest prefix matching (LPM) search engines and cache tags where high speed data search is significant. It creates the need for analysis of critical paths and detecting associated faults using a minimum number of test patterns. This paper proposes a test method to detect critical path delay faults in CAM systems using a newly proposed low power TCAM cell structure. The proposed complement bit walk (CBW) algorithms are using low time complexity such as 3m+n and 2m+2n operations. The fault simulation of the given TCAM system provides 100% fault coverage for the write, search and pseudo logic faults.
{"title":"Path-Delay Fault Testing in Embedded Content Addressable Memories","authors":"P. Manikandan, Bjørn B. Larsen, E. Aas","doi":"10.1109/DSD.2010.48","DOIUrl":"https://doi.org/10.1109/DSD.2010.48","url":null,"abstract":"Delay faults in content addressable memories (CAMs) is a major concern in many applications such as network routers, IP filters, longest prefix matching (LPM) search engines and cache tags where high speed data search is significant. It creates the need for analysis of critical paths and detecting associated faults using a minimum number of test patterns. This paper proposes a test method to detect critical path delay faults in CAM systems using a newly proposed low power TCAM cell structure. The proposed complement bit walk (CBW) algorithms are using low time complexity such as 3m+n and 2m+2n operations. The fault simulation of the given TCAM system provides 100% fault coverage for the write, search and pseudo logic faults.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126034672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}