Early power modeling and analysis using electronic system-level methodology enables designers to explore energy saving opportunities more efficiently at a higher abstraction level. However, power modeling for third party IPs are challenging due to the limited observability and unknown architecture details. To model the data dependency for blackbox IPs, several works rely on adopting Hamming distance of input data to approximate the switching activity, which might be not enough for modeling complex IPs such as image signal processors (ISP). This work introduces a content-aware line-based power modeling method for ISP by training an associated energy table. To effectively estimate ISP energy consumption which involves many two-dimensional data processing, this work presents a direct energy-mapping strategy using pixel luminance and gradient. Moreover, an iterative box-constrained least-squares estimation and its associated constraint refinement scheme is proposed to increase the robustness of the trained energy table even with limited training data. Simulation results show that the proposed method can reduce at least 11.54% of average error and 55.52% of max error as compared to the existing content-blind power model.
{"title":"Content-aware line-based power modeling methodology for image signal processor","authors":"Chun-Wei Chen, Ming-Der Shieh, Juin-Ming Lu, Hsun-Lun Huang, Yao-Hua Chen","doi":"10.1109/SOCC.2017.8226075","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226075","url":null,"abstract":"Early power modeling and analysis using electronic system-level methodology enables designers to explore energy saving opportunities more efficiently at a higher abstraction level. However, power modeling for third party IPs are challenging due to the limited observability and unknown architecture details. To model the data dependency for blackbox IPs, several works rely on adopting Hamming distance of input data to approximate the switching activity, which might be not enough for modeling complex IPs such as image signal processors (ISP). This work introduces a content-aware line-based power modeling method for ISP by training an associated energy table. To effectively estimate ISP energy consumption which involves many two-dimensional data processing, this work presents a direct energy-mapping strategy using pixel luminance and gradient. Moreover, an iterative box-constrained least-squares estimation and its associated constraint refinement scheme is proposed to increase the robustness of the trained energy table even with limited training data. Simulation results show that the proposed method can reduce at least 11.54% of average error and 55.52% of max error as compared to the existing content-blind power model.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"22 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125782386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8226057
Lizheng Liu, Yi Jin, Yi Liu, Ning Ma, Z. Zou, Lirong Zheng
The scalable and massively parallel computing systems composed of many processors, which are connected on chips that will become more and more complex and unreliable. This paper presents a bio-inspired error tolerance framework and three design principles based on the Autonomous Error Tolerant (AET) architecture. A nearby error perception mechanism is carefully designed to detect faults and an initiative evolutions strategy is studied to handle unrecoverable errors. A circuit backup mechanism is proposed for generating an effective way by setting the routing rules to bypass the failed link or node to achieve fault tolerance capabilities. The print circuit board (PCB) prototype is designed and implemented based on a reconfigurable and scalable control-centric dual-core embedded processor (ReSC). Different testing programs associating fault-detection or self-backup schemes and routing algorithms are explored in the platform. Experimental results show that error perceptron can detect the faults and reassign the task for other remaining free and healthy AET cell through Network-on-chip (NoC) when faults occur at the AET cell. The system can complete error recovery within 3 seconds, the paper shows the error-tolerant capability of the proposed architecture is better than the conventional multi-modular redundant system.
{"title":"Designing bio-inspired autonomous error-tolerant massively parallel computing architectures","authors":"Lizheng Liu, Yi Jin, Yi Liu, Ning Ma, Z. Zou, Lirong Zheng","doi":"10.1109/SOCC.2017.8226057","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226057","url":null,"abstract":"The scalable and massively parallel computing systems composed of many processors, which are connected on chips that will become more and more complex and unreliable. This paper presents a bio-inspired error tolerance framework and three design principles based on the Autonomous Error Tolerant (AET) architecture. A nearby error perception mechanism is carefully designed to detect faults and an initiative evolutions strategy is studied to handle unrecoverable errors. A circuit backup mechanism is proposed for generating an effective way by setting the routing rules to bypass the failed link or node to achieve fault tolerance capabilities. The print circuit board (PCB) prototype is designed and implemented based on a reconfigurable and scalable control-centric dual-core embedded processor (ReSC). Different testing programs associating fault-detection or self-backup schemes and routing algorithms are explored in the platform. Experimental results show that error perceptron can detect the faults and reassign the task for other remaining free and healthy AET cell through Network-on-chip (NoC) when faults occur at the AET cell. The system can complete error recovery within 3 seconds, the paper shows the error-tolerant capability of the proposed architecture is better than the conventional multi-modular redundant system.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"392 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126748216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8226051
Hyunjong Kim, Yujin Park, Han Yang, Suhwan Kim
This paper presents a constant bandwidth switched-capacitor programmable-gain amplifier (SC-PGA). By using an adaptive Miller compensation technique for the SC-PGA, our SC-PGA achieves low power consumption and high linearity at various gain conditions. The post-layout simulation results with 0.18 μm CMOS process show that power efficiency is tripled over the SC-PGA without the adaptive Miller compensation technique at 12 V/V gain without degrading performance. Power consumption is 2.8 mW at 3.3 V analog and 1.8 V digital supply voltage.
{"title":"A constant bandwidth switched-capacitor programmable-gain amplifier utilizing adaptive miller compensation technique","authors":"Hyunjong Kim, Yujin Park, Han Yang, Suhwan Kim","doi":"10.1109/SOCC.2017.8226051","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226051","url":null,"abstract":"This paper presents a constant bandwidth switched-capacitor programmable-gain amplifier (SC-PGA). By using an adaptive Miller compensation technique for the SC-PGA, our SC-PGA achieves low power consumption and high linearity at various gain conditions. The post-layout simulation results with 0.18 μm CMOS process show that power efficiency is tripled over the SC-PGA without the adaptive Miller compensation technique at 12 V/V gain without degrading performance. Power consumption is 2.8 mW at 3.3 V analog and 1.8 V digital supply voltage.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"431 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122873740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8226074
S. Kadiyala, V. Pudi, S. Lam
Recently, compressive sensing has attracted a lot of research interest due to its potential for realizing lightweight image compression solutions. Approximate or inexact computing on the other hand has been successfully applied to lower the complexity of hardware architectures for applications where a certain amount of performance degradation is acceptable (e.g. lossy image compression). In our work, we present a novel method for compressive sensing using approximate computing paradigm, in order to realize a hardware-efficient image compression architecture. We adopt Gaussian Random matrix based compression in our work. Library based pruning is used to realize the approximate compression architecture. Further we present a multi-objective optimization method to fine tune our pruning and increase performance of architecture. When compared to the baseline architecture that uses regular multipliers on 65-nm CMOS technology, our proposed image compression architecture achieves 43% area and 54% power savings with minimal PSNR degradation.
{"title":"Approximate compressed sensing for hardware-efficient image compression","authors":"S. Kadiyala, V. Pudi, S. Lam","doi":"10.1109/SOCC.2017.8226074","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226074","url":null,"abstract":"Recently, compressive sensing has attracted a lot of research interest due to its potential for realizing lightweight image compression solutions. Approximate or inexact computing on the other hand has been successfully applied to lower the complexity of hardware architectures for applications where a certain amount of performance degradation is acceptable (e.g. lossy image compression). In our work, we present a novel method for compressive sensing using approximate computing paradigm, in order to realize a hardware-efficient image compression architecture. We adopt Gaussian Random matrix based compression in our work. Library based pruning is used to realize the approximate compression architecture. Further we present a multi-objective optimization method to fine tune our pruning and increase performance of architecture. When compared to the baseline architecture that uses regular multipliers on 65-nm CMOS technology, our proposed image compression architecture achieves 43% area and 54% power savings with minimal PSNR degradation.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129940663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8225998
Kyeongryeol Bong, K. Lee, H. Yoo
A tile-based semi-global matching (SGM) processor with lossless data compression is proposed. The 8×8 tile-base processing and the P2-less data compression can reduce the external memory access by 85% without any change in the processing result. In addition, the P2-less data compression can decrease on-chip SRAM size by 50%. Implemented in 65nm CMOS technology, the 6.3mm2 chip consumes 288mW and supports 590MDE/s (million disparity estimation per second) when processing 640×360 resolution with 64-disparity range at 40fps real-time operation.
{"title":"A 590MDE/s semi-global matching processor with lossless data compression","authors":"Kyeongryeol Bong, K. Lee, H. Yoo","doi":"10.1109/SOCC.2017.8225998","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8225998","url":null,"abstract":"A tile-based semi-global matching (SGM) processor with lossless data compression is proposed. The 8×8 tile-base processing and the P2-less data compression can reduce the external memory access by 85% without any change in the processing result. In addition, the P2-less data compression can decrease on-chip SRAM size by 50%. Implemented in 65nm CMOS technology, the 6.3mm2 chip consumes 288mW and supports 590MDE/s (million disparity estimation per second) when processing 640×360 resolution with 64-disparity range at 40fps real-time operation.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126705363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8226026
Mostafa Said, Hossam Hassan, Hyungwon Kim, Mostafa Khamis
Power consumption reduction is a very critical challenge in nowadays nanoscale circuits. In this paper, a new power reduction approach is demonstrated. This approach is originally based on the idea of TSV multiplexing in 3D-ICs where two or more signals can flow through one TSV instead of multiple TSVs. Based on that behavior, the possibility of power reduction of this circuit is discovered and its generalization to any wire, i.e., wire multiplexing, is detailed. Also, an analytical power model for this circuit is developed to predict its power consumption behavior. Further, and by means of Cadence-Spectre simulations on 65 nm technology and also using the developed analytical model, the power reduction of wire multiplexing technique could be proved and verified.
{"title":"A novel power reduction technique using wire multiplexing","authors":"Mostafa Said, Hossam Hassan, Hyungwon Kim, Mostafa Khamis","doi":"10.1109/SOCC.2017.8226026","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226026","url":null,"abstract":"Power consumption reduction is a very critical challenge in nowadays nanoscale circuits. In this paper, a new power reduction approach is demonstrated. This approach is originally based on the idea of TSV multiplexing in 3D-ICs where two or more signals can flow through one TSV instead of multiple TSVs. Based on that behavior, the possibility of power reduction of this circuit is discovered and its generalization to any wire, i.e., wire multiplexing, is detailed. Also, an analytical power model for this circuit is developed to predict its power consumption behavior. Further, and by means of Cadence-Spectre simulations on 65 nm technology and also using the developed analytical model, the power reduction of wire multiplexing technique could be proved and verified.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126460899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8226056
Jin Hee Kim, Brett Grady, Ruolong Lian, J. Brothers, J. Anderson
A deep-learning inference accelerator is synthesized from a C-language software program parallelized with Pthreads. The software implementation uses the well-known producer/consumer model with parallel threads interconnected by FIFO queues. The LegUp high-level synthesis (HLS) [1] tool synthesizes threads into parallel FPGA hardware, translating software parallelism into spatial parallelism. A complete system is generated where convolution, pooling and padding are realized in the synthesized accelerator, with remaining tasks executing on an embedded ARM processor. The accelerator incorporates reduced precision, and a novel approach for zero-weight-skipping in convolution. On a mid-sized Intel Arria 10 SoC FPGA, peak performance on VGG-16 is 138 effective GOPS.
{"title":"FPGA-based CNN inference accelerator synthesized from multi-threaded C software","authors":"Jin Hee Kim, Brett Grady, Ruolong Lian, J. Brothers, J. Anderson","doi":"10.1109/SOCC.2017.8226056","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226056","url":null,"abstract":"A deep-learning inference accelerator is synthesized from a C-language software program parallelized with Pthreads. The software implementation uses the well-known producer/consumer model with parallel threads interconnected by FIFO queues. The LegUp high-level synthesis (HLS) [1] tool synthesizes threads into parallel FPGA hardware, translating software parallelism into spatial parallelism. A complete system is generated where convolution, pooling and padding are realized in the synthesized accelerator, with remaining tasks executing on an embedded ARM processor. The accelerator incorporates reduced precision, and a novel approach for zero-weight-skipping in convolution. On a mid-sized Intel Arria 10 SoC FPGA, peak performance on VGG-16 is 138 effective GOPS.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124112243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8226000
Daniel Flórez, Martha Johanna Sepúlveda
Cardiovascular Diseases (CVDs) are a major concern. They are responsible for 35% of deaths and for costs of billions of dollars worldwide. Prevention of CVDs has become a global priority. Comprehensive use of wearable devices operating in the context of Internet-of-Things (IoT) paradigm is the key to monitor, diagnose and treat CVDs. Most of the previous approaches propose wearables only for non-invasive blood pressure and heart rate monitoring. However, in order to improve the quality of the detection and prevention of CVDs, this measurements must be combined with oximeter monitoring (SPO2). In this work we propose BlooXY, a wearable device that operates in the context of IoT to measure the blood pressure, oximetry and heart rate. We show that BlooXY is an efficient aid in the prevention, control and treatment of CVDs.
{"title":"BlooXY: On a non-invasive blood monitor for the IoT context","authors":"Daniel Flórez, Martha Johanna Sepúlveda","doi":"10.1109/SOCC.2017.8226000","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226000","url":null,"abstract":"Cardiovascular Diseases (CVDs) are a major concern. They are responsible for 35% of deaths and for costs of billions of dollars worldwide. Prevention of CVDs has become a global priority. Comprehensive use of wearable devices operating in the context of Internet-of-Things (IoT) paradigm is the key to monitor, diagnose and treat CVDs. Most of the previous approaches propose wearables only for non-invasive blood pressure and heart rate monitoring. However, in order to improve the quality of the detection and prevention of CVDs, this measurements must be combined with oximeter monitoring (SPO2). In this work we propose BlooXY, a wearable device that operates in the context of IoT to measure the blood pressure, oximetry and heart rate. We show that BlooXY is an efficient aid in the prevention, control and treatment of CVDs.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114991318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8226059
A. Srivatsa, Sven Rheindt, Thomas Wild, A. Herkersdorf
The need for faster and more energy efficient computing has led us to the multicore era with distributed shared memory hierarchies. The primary goal is to distribute parallel tasks onto multiple processing elements to collectively achieve shorter execution times at lower frequencies and supply voltages when compared to a single-core architecture. Major challenges of this approach are how to achieve local, low latency memory accesses and low overheads for coherence and synchronization management. We believe that enabling global coherence in tiled many-core architectures does not scale in a cost efficient manner and isn't even required for applications with limited degrees of parallelism. In this paper, we propose a novel region based cache coherence scheme, where coherence is provided by hardware directories within a flexibly sized but confined set of compute and memory tiles. We also show that data placement and task mapping have a huge impact on the application performance, and hence should be considered in conjunction with region based coherence. The approach is evaluated by means of a high level simulation model using workloads from PARSEC. Experiments demonstrate that our region based approach with multiple compute tiles increases performance by a factor of up to 2.5 compared to a single tile structure with nominally identical computing and memory resources. Thus the independent local memory accesses, which are effectively increasing the memory bandwidth, usually outweigh the penalties of inter-tile remote memory accesses. Our approach also reduces the directory structures significantly compared to traditional schemes, making it scalable for large MPSoCs (eg. by 41.4% for a 16 tile system with 4 tiles per region). Considering data-to-task-placement, our investigations show that it can lead to performance variations up to a factor of 12.7.
{"title":"Region based cache coherence for tiled MPSoCs","authors":"A. Srivatsa, Sven Rheindt, Thomas Wild, A. Herkersdorf","doi":"10.1109/SOCC.2017.8226059","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226059","url":null,"abstract":"The need for faster and more energy efficient computing has led us to the multicore era with distributed shared memory hierarchies. The primary goal is to distribute parallel tasks onto multiple processing elements to collectively achieve shorter execution times at lower frequencies and supply voltages when compared to a single-core architecture. Major challenges of this approach are how to achieve local, low latency memory accesses and low overheads for coherence and synchronization management. We believe that enabling global coherence in tiled many-core architectures does not scale in a cost efficient manner and isn't even required for applications with limited degrees of parallelism. In this paper, we propose a novel region based cache coherence scheme, where coherence is provided by hardware directories within a flexibly sized but confined set of compute and memory tiles. We also show that data placement and task mapping have a huge impact on the application performance, and hence should be considered in conjunction with region based coherence. The approach is evaluated by means of a high level simulation model using workloads from PARSEC. Experiments demonstrate that our region based approach with multiple compute tiles increases performance by a factor of up to 2.5 compared to a single tile structure with nominally identical computing and memory resources. Thus the independent local memory accesses, which are effectively increasing the memory bandwidth, usually outweigh the penalties of inter-tile remote memory accesses. Our approach also reduces the directory structures significantly compared to traditional schemes, making it scalable for large MPSoCs (eg. by 41.4% for a 16 tile system with 4 tiles per region). Considering data-to-task-placement, our investigations show that it can lead to performance variations up to a factor of 12.7.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125973391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8226019
Nisha Jacob, J. Wittmann, Johann Heyszl, Robert Hesselbarth, F. Wilde, Michael Pehl, G. Sigl, K. Fischer
System-on-Chips which include FPGAs are important platforms for critical applications since they provide significant software performance through multi-core CPUs as well as high versatility through integrated FPGAs. Those integrated FP-GAs allow to update the programmable hardware functionality, e.g. to include new communication interfaces or to update cryptographic accelerators during the life-time of devices. Updating software as well as hardware configuration is required for critical applications such as e.g. industrial control devices or vehicles with long life-times. Such updates must be authenticated and possibly encrypted. One way to achieve this is to rely on static FPGA manufacturer-provided cryptography and respective master keys. However, in this contribution, we show how to retrofit Xilinx Zynq FPGAs with an alternative cryptographic accelerator and how to establish device-individual keys using Physical Unclonable Function (PUF) technology. These two key aspects reduce the required trust in manufacturer-provided security features while increasing the security by binding configurations to a specific device.
{"title":"Securing FPGA SoC configurations independent of their manufacturers","authors":"Nisha Jacob, J. Wittmann, Johann Heyszl, Robert Hesselbarth, F. Wilde, Michael Pehl, G. Sigl, K. Fischer","doi":"10.1109/SOCC.2017.8226019","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226019","url":null,"abstract":"System-on-Chips which include FPGAs are important platforms for critical applications since they provide significant software performance through multi-core CPUs as well as high versatility through integrated FPGAs. Those integrated FP-GAs allow to update the programmable hardware functionality, e.g. to include new communication interfaces or to update cryptographic accelerators during the life-time of devices. Updating software as well as hardware configuration is required for critical applications such as e.g. industrial control devices or vehicles with long life-times. Such updates must be authenticated and possibly encrypted. One way to achieve this is to rely on static FPGA manufacturer-provided cryptography and respective master keys. However, in this contribution, we show how to retrofit Xilinx Zynq FPGAs with an alternative cryptographic accelerator and how to establish device-individual keys using Physical Unclonable Function (PUF) technology. These two key aspects reduce the required trust in manufacturer-provided security features while increasing the security by binding configurations to a specific device.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129407856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}