Farah Naz Taher, Joseph Callenes-Sloan, Benjamin Carrión Schäfer
Continuous pursuit of higher performance and energy efficiency has led to heterogeneous SoC that contains multiple dedicated hardware accelerators. These accelerators exploit the inherent parallelism of tasks and are often tolerant to inaccuracies in their outputs, e.g. image and digital signal processing applications. At the same time, permanent faults are escalating due to process scaling and power restrictions, leading to erroneous outputs. To address this issue, in this paper, we propose a low-cost, universal fault-recovery/repair method that utilizes supervised machine learning techniques to ameliorate the effect of permanent fault(s) in hardware accelerators that can tolerate inexact outputs. The proposed compensation model does not require any information about the accelerator and is highly scalable with low area overhead. Experimental results show, the proposed method improves the accuracy by 50% and decreases the overall mean error rate by 90% with an area overhead of 5% compared to execution without fault compensation.
{"title":"A Machine Learning based Hard Fault Recuperation Model for Approximate Hardware Accelerators","authors":"Farah Naz Taher, Joseph Callenes-Sloan, Benjamin Carrión Schäfer","doi":"10.1145/3195970.3195974","DOIUrl":"https://doi.org/10.1145/3195970.3195974","url":null,"abstract":"Continuous pursuit of higher performance and energy efficiency has led to heterogeneous SoC that contains multiple dedicated hardware accelerators. These accelerators exploit the inherent parallelism of tasks and are often tolerant to inaccuracies in their outputs, e.g. image and digital signal processing applications. At the same time, permanent faults are escalating due to process scaling and power restrictions, leading to erroneous outputs. To address this issue, in this paper, we propose a low-cost, universal fault-recovery/repair method that utilizes supervised machine learning techniques to ameliorate the effect of permanent fault(s) in hardware accelerators that can tolerate inexact outputs. The proposed compensation model does not require any information about the accelerator and is highly scalable with low area overhead. Experimental results show, the proposed method improves the accuracy by 50% and decreases the overall mean error rate by 90% with an area overhead of 5% compared to execution without fault compensation.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87831720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chixiao Chen, Huwan Peng, Xindi Liu, Hongwei Ding, C. R. Shi
This paper presents an instruction and Fabric Programmable Neuron Array (iFPNA) architecture, its 28nm CMOS chip prototype, and a compiler for the acceleration of a variety of deep learning neural networks (DNNs) including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and fully connected (FC) networks on chip. The iFPNA architecture combines instruction-level programmability as in an Instruction Set Architecture (ISA) with logic-level reconfigurability as in a Field-Programmable Gate Array (FPGA) in a sliced structure for scalability. Four data flow models, namely weight stationary, input stationary, row stationary and tunnel stationary, are described as the abstraction of various DNN data and computational dependence. The iFPNA compiler partitions a large-size DNN to smaller networks, each being mapped to, optimized and code generated for, the underlying iFPNA processor using one or a mixture of the four data-flow models. Experimental results have shown that state-of-art large-size CNNs, RNNs, and FC networks can be mapped to the iFPNA processor achieving the near ASIC performance.
{"title":"Exploring the Programmability for Deep Learning Processors: from Architecture to Tensorization","authors":"Chixiao Chen, Huwan Peng, Xindi Liu, Hongwei Ding, C. R. Shi","doi":"10.1145/3195970.3196049","DOIUrl":"https://doi.org/10.1145/3195970.3196049","url":null,"abstract":"This paper presents an instruction and Fabric Programmable Neuron Array (iFPNA) architecture, its 28nm CMOS chip prototype, and a compiler for the acceleration of a variety of deep learning neural networks (DNNs) including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and fully connected (FC) networks on chip. The iFPNA architecture combines instruction-level programmability as in an Instruction Set Architecture (ISA) with logic-level reconfigurability as in a Field-Programmable Gate Array (FPGA) in a sliced structure for scalability. Four data flow models, namely weight stationary, input stationary, row stationary and tunnel stationary, are described as the abstraction of various DNN data and computational dependence. The iFPNA compiler partitions a large-size DNN to smaller networks, each being mapped to, optimized and code generated for, the underlying iFPNA processor using one or a mixture of the four data-flow models. Experimental results have shown that state-of-art large-size CNNs, RNNs, and FC networks can be mapped to the iFPNA processor achieving the near ASIC performance.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"62 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87072441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei-Lin Wang, Tseng-Yi Chen, Yuan-Hao Chang, H. Wei, W. Shih
Due to the decreasing endurance of flash chips, the lifetime of flash drives has become a critical issue. To resolve this issue, various techniques such as wear-leveling and error correction code have been proposed to reduce the bit error rates of flash storage devices. In contrast to these techniques, we observe that minimizing write amplification is another promising direction to enhance the lifetime of a flash storage device. However, the development trend of large-page flash memory exacerbates the write amplification issue. In this work, we present a compression-based management design to deal with compressed data updates and internal fragmentation in flash pages. Thus, it can minimize write amplification by only updating the modified part of flash pages with the support of data reduction techniques; and the reduced write amplification degree is more significant when the flash page size becomes larger due to the development trend. This design is orthogonal to wear-leveling and error correction techniques and thus can cooperate with them to further enhance the lifetime of a flash device. Based on a series of experiments, the results demonstrate that the proposed design can effectively improve the lifetime of a flash storage device by reducing write amplification.
{"title":"Minimizing Write Amplification to Enhance Lifetime of Large-page Flash-Memory Storage Devices","authors":"Wei-Lin Wang, Tseng-Yi Chen, Yuan-Hao Chang, H. Wei, W. Shih","doi":"10.1145/3195970.3196076","DOIUrl":"https://doi.org/10.1145/3195970.3196076","url":null,"abstract":"Due to the decreasing endurance of flash chips, the lifetime of flash drives has become a critical issue. To resolve this issue, various techniques such as wear-leveling and error correction code have been proposed to reduce the bit error rates of flash storage devices. In contrast to these techniques, we observe that minimizing write amplification is another promising direction to enhance the lifetime of a flash storage device. However, the development trend of large-page flash memory exacerbates the write amplification issue. In this work, we present a compression-based management design to deal with compressed data updates and internal fragmentation in flash pages. Thus, it can minimize write amplification by only updating the modified part of flash pages with the support of data reduction techniques; and the reduced write amplification degree is more significant when the flash page size becomes larger due to the development trend. This design is orthogonal to wear-leveling and error correction techniques and thus can cooperate with them to further enhance the lifetime of a flash device. Based on a series of experiments, the results demonstrate that the proposed design can effectively improve the lifetime of a flash storage device by reducing write amplification.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86189042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Satwik Patnaik, M. Ashraf, J. Knechtel, O. Sinanoglu
Split manufacturing (SM) seeks to protect against piracy of intellectual property (IP) in chip designs. Here we propose a scheme to manipulate both placement and routing in an intertwined manner, thereby increasing the resilience of SM layouts. Key stages of our scheme are to (partially) randomize a design, place and route the erroneous netlist, and restore the original design by re-routing the BEOL. Based on state-of-the-art proximity attacks, we demonstrate that our scheme notably excels over the prior art (i.e., 0% correct connection rates). Our scheme induces controllable PPA overheads and lowers commercial cost (the latter by splitting at higher layers).
{"title":"Raise Your Game for Split Manufacturing: Restoring the True Functionality Through BEOL","authors":"Satwik Patnaik, M. Ashraf, J. Knechtel, O. Sinanoglu","doi":"10.1145/3195970.3196100","DOIUrl":"https://doi.org/10.1145/3195970.3196100","url":null,"abstract":"Split manufacturing (SM) seeks to protect against piracy of intellectual property (IP) in chip designs. Here we propose a scheme to manipulate both placement and routing in an intertwined manner, thereby increasing the resilience of SM layouts. Key stages of our scheme are to (partially) randomize a design, place and route the erroneous netlist, and restore the original design by re-routing the BEOL. Based on state-of-the-art proximity attacks, we demonstrate that our scheme notably excels over the prior art (i.e., 0% correct connection rates). Our scheme induces controllable PPA overheads and lowers commercial cost (the latter by splitting at higher layers).","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"46 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77760345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chunfeng Liu, Bing Li, Tsung-Yi Ho, K. Chakrabarty, Ulf Schlichtmann
Flow-based microfluidic biochips are gaining traction in the microfluidics community since they enable efficient and low-cost biochemical experiments. These highly integrated lab-on-a-chip systems, however, suffer from manufacturing defects, which cause some chips to malfunction. To test biochips after manufacturing, air pressure is applied to input ports of a chip and predetermined test vectors are used to change the states of microvalves in the chip. Pressure meters are connected to the output ports to measure pressure values, which are compared with expected values to detect errors. To reduce the cost of the test platform, the number of pressure sources and meters should be reduced. We propose a design-for-testability (DFT) technique that enables a test procedure with only a single pressure source and a single pressure meter. Furthermore, the valves inserted for DFT share control channels with valves in the original chip so that no additional control signals are required. Simulation results demonstrate that this technique can generate efficient chip architectures for single-source single-meter test in all experiment cases successfully to reduce test cost, while the performance of these chips in executing applications is still maintained.
{"title":"Design-for-Testability for Continuous-Flow Microfluidic Biochips","authors":"Chunfeng Liu, Bing Li, Tsung-Yi Ho, K. Chakrabarty, Ulf Schlichtmann","doi":"10.1145/3195970.3196025","DOIUrl":"https://doi.org/10.1145/3195970.3196025","url":null,"abstract":"Flow-based microfluidic biochips are gaining traction in the microfluidics community since they enable efficient and low-cost biochemical experiments. These highly integrated lab-on-a-chip systems, however, suffer from manufacturing defects, which cause some chips to malfunction. To test biochips after manufacturing, air pressure is applied to input ports of a chip and predetermined test vectors are used to change the states of microvalves in the chip. Pressure meters are connected to the output ports to measure pressure values, which are compared with expected values to detect errors. To reduce the cost of the test platform, the number of pressure sources and meters should be reduced. We propose a design-for-testability (DFT) technique that enables a test procedure with only a single pressure source and a single pressure meter. Furthermore, the valves inserted for DFT share control channels with valves in the original chip so that no additional control signals are required. Simulation results demonstrate that this technique can generate efficient chip architectures for single-source single-meter test in all experiment cases successfully to reduce test cost, while the performance of these chips in executing applications is still maintained.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"2 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77822075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shiqi Lian, Yinhe Han, Xiaoming Chen, Ying Wang, Hang Xiao
As a critical operation in robotics, motion planning consumes lots of time and energy, especially in a dynamic environment. Through approaches based on general-purpose processors, it is hard to get a valid planning in real time. We present an accelerator to speed up collision detection, which costs over 90% of the computation time in motion planning. Via the octree-based roadmap representation, the accelerator can be reconfigured online and support large roadmaps. We in addition propose an effective algorithm to update the roadmap in a dynamic environment, together with a batched incremental processing approach to reduce the complexity of collision detection. Experimental results show that our accelerator achieves 26.5X speedup than an existing CPU-based approach. With the incremental approach, the performance further improves by 10X while the solution quality is degraded by 10% only.
{"title":"Dadu-P: A Scalable Accelerator for Robot Motion Planning in a Dynamic Environment","authors":"Shiqi Lian, Yinhe Han, Xiaoming Chen, Ying Wang, Hang Xiao","doi":"10.1145/3195970.3196020","DOIUrl":"https://doi.org/10.1145/3195970.3196020","url":null,"abstract":"As a critical operation in robotics, motion planning consumes lots of time and energy, especially in a dynamic environment. Through approaches based on general-purpose processors, it is hard to get a valid planning in real time. We present an accelerator to speed up collision detection, which costs over 90% of the computation time in motion planning. Via the octree-based roadmap representation, the accelerator can be reconfigured online and support large roadmaps. We in addition propose an effective algorithm to update the roadmap in a dynamic environment, together with a batched incremental processing approach to reduce the complexity of collision detection. Experimental results show that our accelerator achieves 26.5X speedup than an existing CPU-based approach. With the incremental approach, the performance further improves by 10X while the solution quality is degraded by 10% only.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"258 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82048734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work is the first to analyze the security of split manufacturing using machine learning, based on data collected from layouts provided by industry, with 8 routing metal layers, and significant variation in wire size and routing congestion across the layers. We consider many types of layout features for machine learning including those obtained from placement, routing, and cell sizes. For the top split layer, we demonstrate dramatically better results in proximity attack compared to a recent prior work. We analyze the ranking of the features used by machine learning and show the importance of how features vary when moving to the lower layers. Since the runtime of our basic machine learning becomes prohibitively large for lower layers, we propose novel techniques to make it scalable with little sacrifice in effectiveness of the attack.
{"title":"Analysis of Security of Split Manufacturing using Machine Learning","authors":"Boyu Zhang, J. Magaña, A. Davoodi","doi":"10.1145/3195970.3195991","DOIUrl":"https://doi.org/10.1145/3195970.3195991","url":null,"abstract":"This work is the first to analyze the security of split manufacturing using machine learning, based on data collected from layouts provided by industry, with 8 routing metal layers, and significant variation in wire size and routing congestion across the layers. We consider many types of layout features for machine learning including those obtained from placement, routing, and cell sizes. For the top split layer, we demonstrate dramatically better results in proximity attack compared to a recent prior work. We analyze the ranking of the features used by machine learning and show the importance of how features vary when moving to the lower layers. Since the runtime of our basic machine learning becomes prohibitively large for lower layers, we propose novel techniques to make it scalable with little sacrifice in effectiveness of the attack.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"42 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75444059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The exponential growth in PVT corners due to Moore's law scaling, and the increasing demand for consumer applications and longer battery life in mobile devices, has ushered in significant cost and power-related challenges for designing and productizing mobile chips within a predictable schedule. Two main reasons for this are the reliance on human decision-making to achieve the desired performance within the target area and power budget, and significant increases in complexity of the human decision-making space. The problem is that to-date human design experience has not been replaced by design automation tools, and tasks requiring experience of past designs are still being performed manually.In this paper we investigate how machine learning may be applied to develop tools that learn from experience just like human designers, thus automating tasks that still require human intervention. The potential advantage of the machine learning approach is the ability to scale with increasing complexity and therefore hold the design-time constant with same manpower.Reinforcement Learning (RL) is a machine learning technique that allows us to mimic a human designers' ability to learn from experience and automate human decision-making, without loss in quality of the design, while making the design time independent of the complexity. In this paper we show how manual design tasks can be abstracted as RL problems. Based on the experience with applying RL to one of these problems, we show that RL can automatically achieve results similar to human designs, but in a predictable schedule. However, a major drawback is that the RL solution can require a prohibitively large number of iterations for training. If efficient training techniques can be developed for RL, it holds great promise to automate tasks requiring human experience. In this paper we present a Bayesian Optimization technique for reducing the RL training time.
{"title":"Invited: Efficient Reinforcement Learning for Automating Human Decision-Making in SoC Design","authors":"Shankar Sadasivam, Zhuo Chen, Jinwon Lee, Rajeev Jain","doi":"10.1145/3195970.3199855","DOIUrl":"https://doi.org/10.1145/3195970.3199855","url":null,"abstract":"The exponential growth in PVT corners due to Moore's law scaling, and the increasing demand for consumer applications and longer battery life in mobile devices, has ushered in significant cost and power-related challenges for designing and productizing mobile chips within a predictable schedule. Two main reasons for this are the reliance on human decision-making to achieve the desired performance within the target area and power budget, and significant increases in complexity of the human decision-making space. The problem is that to-date human design experience has not been replaced by design automation tools, and tasks requiring experience of past designs are still being performed manually.In this paper we investigate how machine learning may be applied to develop tools that learn from experience just like human designers, thus automating tasks that still require human intervention. The potential advantage of the machine learning approach is the ability to scale with increasing complexity and therefore hold the design-time constant with same manpower.Reinforcement Learning (RL) is a machine learning technique that allows us to mimic a human designers' ability to learn from experience and automate human decision-making, without loss in quality of the design, while making the design time independent of the complexity. In this paper we show how manual design tasks can be abstracted as RL problems. Based on the experience with applying RL to one of these problems, we show that RL can automatically achieve results similar to human designs, but in a predictable schedule. However, a major drawback is that the RL solution can require a prohibitively large number of iterations for training. If efficient training techniques can be developed for RL, it holds great promise to automate tasks requiring human experience. In this paper we present a Bayesian Optimization technique for reducing the RL training time.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"20 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83276238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christian J. Dietrich, Achim Schmider, Oskar Pusz, G. P. Vayá, D. Lohmann
With shrinking structure sizes, soft-error mitigation has become a major challenge in the design and certification of safety-critical embedded systems. Their robustness is quantified by extensive fault-injection campaigns, which on hardware level can nevertheless cover only a tiny part of the fault space.We suggest Fault-Masking Terms (MATEs) to effectively prune the fault space for gate-level fault injection campaigns by using the (software-induced) hardware state to dynamically cut off benign faults. Our tool applied to an AVR core and a size-optimized MSP430 implementation shows that up to 21 percent of all SEUs on flip-flop level are masked within one clock cycle.
{"title":"Cross-Layer Fault-Space Pruning for Hardware-Assisted Fault Injection","authors":"Christian J. Dietrich, Achim Schmider, Oskar Pusz, G. P. Vayá, D. Lohmann","doi":"10.1145/3195970.3196019","DOIUrl":"https://doi.org/10.1145/3195970.3196019","url":null,"abstract":"With shrinking structure sizes, soft-error mitigation has become a major challenge in the design and certification of safety-critical embedded systems. Their robustness is quantified by extensive fault-injection campaigns, which on hardware level can nevertheless cover only a tiny part of the fault space.We suggest Fault-Masking Terms (MATEs) to effectively prune the fault space for gate-level fault injection campaigns by using the (software-induced) hardware state to dynamically cut off benign faults. Our tool applied to an AVR core and a size-optimized MSP430 implementation shows that up to 21 percent of all SEUs on flip-flop level are masked within one clock cycle.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"49 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90279901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brucek Khailany, Evgeni Khmer, Rangharajan Venkatesan, Jason Clemons, J. Emer, Matthew R. Fojtik, Alicia Klinefelter, Michael Pellauer, N. Pinckney, Y. Shao, S. Srinath, Christopher Torng, S. Xi, Yanqing Zhang, B. Zimmer
A high-productivity digital VLSI flow for designing complex SoCs is presented. The flow includes high-level synthesis tools, an object-oriented library of synthesizable SystemC and C++ components, and a modular VLSI physical design approach based on fine-grained globally asynchronous locally synchronous (GALS) clocking. The flow was demonstrated on a 16nm FinFET testchip targeting machine learning and computer vision.
{"title":"INVITED: A Modular Digital VLSI Flow for High-Productivity SoC Design","authors":"Brucek Khailany, Evgeni Khmer, Rangharajan Venkatesan, Jason Clemons, J. Emer, Matthew R. Fojtik, Alicia Klinefelter, Michael Pellauer, N. Pinckney, Y. Shao, S. Srinath, Christopher Torng, S. Xi, Yanqing Zhang, B. Zimmer","doi":"10.1145/3195970.3199846","DOIUrl":"https://doi.org/10.1145/3195970.3199846","url":null,"abstract":"A high-productivity digital VLSI flow for designing complex SoCs is presented. The flow includes high-level synthesis tools, an object-oriented library of synthesizable SystemC and C++ components, and a modular VLSI physical design approach based on fine-grained globally asynchronous locally synchronous (GALS) clocking. The flow was demonstrated on a 16nm FinFET testchip targeting machine learning and computer vision.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"43 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90596414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}