Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218491
Shaza Zeitouni, Emmanuel Stapf, H. Fereidooni, A. Sadeghi
PUFs are cost-effective security primitives that extract unique identifiers from integrated circuits. However, since their introduction, PUFs have been subject to modeling attacks based on machine learning. Recently, researchers explored emerging nano-electronic technologies, e.g., memristors, to construct hybrid-PUFs, which outperform CMOS-only PUFs and are claimed to be more resilient to modeling attacks. However, since such PUF designs are not open-source, the security claims remain dubious. In this paper, we reproduce a set of memristor-PUFs and extensively evaluate their unpredictability property. By leveraging state-of-the-art machine learning algorithms, we show that it is feasible to successfully model memristor-PUFs with high prediction rates of 98%. Even incorporating XOR gates, to further strengthen PUFs’ against modeling attacks, has a negligible effect.
{"title":"On the Security of Strong Memristor-based Physically Unclonable Functions","authors":"Shaza Zeitouni, Emmanuel Stapf, H. Fereidooni, A. Sadeghi","doi":"10.1109/DAC18072.2020.9218491","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218491","url":null,"abstract":"PUFs are cost-effective security primitives that extract unique identifiers from integrated circuits. However, since their introduction, PUFs have been subject to modeling attacks based on machine learning. Recently, researchers explored emerging nano-electronic technologies, e.g., memristors, to construct hybrid-PUFs, which outperform CMOS-only PUFs and are claimed to be more resilient to modeling attacks. However, since such PUF designs are not open-source, the security claims remain dubious. In this paper, we reproduce a set of memristor-PUFs and extensively evaluate their unpredictability property. By leveraging state-of-the-art machine learning algorithms, we show that it is feasible to successfully model memristor-PUFs with high prediction rates of 98%. Even incorporating XOR gates, to further strengthen PUFs’ against modeling attacks, has a negligible effect.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116159854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218686
Shuai Zhao, Zhe Jiang, Xiaotian Dai, I. Bate, I. Habli, Wanli Chang
General-purpose I/O widely exists on multi- and many-core systems. For real-time applications, I/O operations are often required to be timing-predictable, i.e., bounded in the worst case, and timing-accurate, i.e., occur at (or near) an exact desired time instant. Unfortunately, both timing requirements of I/O operations are hard to achieve from the system level, especially for many-core architectures, due to various latency and contention factors presented in the path of instigating an I/O request. This paper considers a dedicated I/O co-processing unit, and proposes two scheduling methods, with the necessary hardware support implemented. It is the first work that guarantees timing predictability and maximises timing accuracy of I/O tasks in the multi-and many-core systems.
{"title":"Timing-Accurate General-Purpose I/O for Multi- and Many-Core Systems: Scheduling and Hardware Support","authors":"Shuai Zhao, Zhe Jiang, Xiaotian Dai, I. Bate, I. Habli, Wanli Chang","doi":"10.1109/DAC18072.2020.9218686","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218686","url":null,"abstract":"General-purpose I/O widely exists on multi- and many-core systems. For real-time applications, I/O operations are often required to be timing-predictable, i.e., bounded in the worst case, and timing-accurate, i.e., occur at (or near) an exact desired time instant. Unfortunately, both timing requirements of I/O operations are hard to achieve from the system level, especially for many-core architectures, due to various latency and contention factors presented in the path of instigating an I/O request. This paper considers a dedicated I/O co-processing unit, and proposes two scheduling methods, with the necessary hardware support implemented. It is the first work that guarantees timing predictability and maximises timing accuracy of I/O tasks in the multi-and many-core systems.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121915086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218704
Aibin Yan, Xiangfeng Feng, Xiaohu Zhao, Han Zhou, Jie Cui, Zuobin Ying, P. Girard, X. Wen
This paper proposes a cost-effective, high-impedance-state (HIS)-insensitive, triple-node-upset (TNU)-tolerant and single-event-transient (SET)-filterable latch, namely HITTSFL, to ensure high reliability with low-cost. The latch mainly comprises an output-level SET-filterable Schmitt-trigger and three inverters that make the values stored in three parallel single-node-upset (SNU)-recoverable dual-interlocked-storage-cells (DICEs) converge at a common node to tolerate any possible TNU. The latch does not use C-elements to be insensitive to the HIS. Simulation results demonstrate the TNU-tolerability and SET-filterability of the proposed HITTSFL latch. Moreover, due to the use of clock-gating technologies and fewer transistors, the proposed latch can reduce delay, power, and area by 76.65%, 6.16%, and 28.55%, respectively, compared with the state-of-the-art TNU hardened latch (TNUHL) that cannot filter SETs.
{"title":"HITTSFL: Design of a Cost-Effective HIS-Insensitive TNU-Tolerant and SET-Filterable Latch for Safety-Critical Applications","authors":"Aibin Yan, Xiangfeng Feng, Xiaohu Zhao, Han Zhou, Jie Cui, Zuobin Ying, P. Girard, X. Wen","doi":"10.1109/DAC18072.2020.9218704","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218704","url":null,"abstract":"This paper proposes a cost-effective, high-impedance-state (HIS)-insensitive, triple-node-upset (TNU)-tolerant and single-event-transient (SET)-filterable latch, namely HITTSFL, to ensure high reliability with low-cost. The latch mainly comprises an output-level SET-filterable Schmitt-trigger and three inverters that make the values stored in three parallel single-node-upset (SNU)-recoverable dual-interlocked-storage-cells (DICEs) converge at a common node to tolerate any possible TNU. The latch does not use C-elements to be insensitive to the HIS. Simulation results demonstrate the TNU-tolerability and SET-filterability of the proposed HITTSFL latch. Moreover, due to the use of clock-gating technologies and fewer transistors, the proposed latch can reduce delay, power, and area by 76.65%, 6.16%, and 28.55%, respectively, compared with the state-of-the-art TNU hardened latch (TNUHL) that cannot filter SETs.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124410635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218659
Stefano Aldegheri, N. Bombieri, F. Fummi, Simone Girardi, R. Muradore, Nicola Piccinelli
We present a toolehain based on Docker and KubeEdge that enables containerization and orchestration of ROS-based robotic SW applications on heterogeneous and hierarchical HW architectures. The toolehain allows for verification of functional and real-time constraints through HW-in-the-loop simulation, and for automatic mapping exploration of the SW across Cloud-Server-Edge architectures. We present the results obtained for the deployment of a real case of study composed by an ORB-SLAM application combined to local/global planners with obstacle avoidance for a mobile robot navigation.
{"title":"Late Breaking Results: Enabling Containerized Computing and Orchestration of ROS-based Robotic SW Applications on Cloud-Server-Edge Architectures","authors":"Stefano Aldegheri, N. Bombieri, F. Fummi, Simone Girardi, R. Muradore, Nicola Piccinelli","doi":"10.1109/DAC18072.2020.9218659","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218659","url":null,"abstract":"We present a toolehain based on Docker and KubeEdge that enables containerization and orchestration of ROS-based robotic SW applications on heterogeneous and hierarchical HW architectures. The toolehain allows for verification of functional and real-time constraints through HW-in-the-loop simulation, and for automatic mapping exploration of the SW across Cloud-Server-Edge architectures. We present the results obtained for the deployment of a real case of study composed by an ORB-SLAM application combined to local/global planners with obstacle avoidance for a mobile robot navigation.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125816524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218675
Qi Liu, Tao Liu, Zihao Liu, Wujie Wen, Chengmo Yang
ReRAM-based neural network accelerator is a promising solution to handle the memory-and computation-intensive deep learning workloads. However, it suffers from unique device errors. These errors can accumulate to massive levels during the run time and cause significant accuracy drop. It is crucial to obtain its fault status in real-time before any proper repair mechanism can be applied. However, calibrating such statistical information is non-trivial because of the need of a large number of test patterns, long test time, and high test coverage considering that complex errors may appear in million-to-billion weight parameters. In this paper, we leverage the concept of comer data that can significantly confuse the decision making of neural network model, as well as the training algorithm, to generate only a small set of test patterns that is tuned to be sensitive to different levels of error accumulation and accuracy loss. Experimental results show that our method can quickly and correctly report the fault status of a running accelerator, outperforming existing solutions in both detection efficiency and cost
{"title":"Monitoring the Health of Emerging Neural Network Accelerators with Cost-effective Concurrent Test","authors":"Qi Liu, Tao Liu, Zihao Liu, Wujie Wen, Chengmo Yang","doi":"10.1109/DAC18072.2020.9218675","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218675","url":null,"abstract":"ReRAM-based neural network accelerator is a promising solution to handle the memory-and computation-intensive deep learning workloads. However, it suffers from unique device errors. These errors can accumulate to massive levels during the run time and cause significant accuracy drop. It is crucial to obtain its fault status in real-time before any proper repair mechanism can be applied. However, calibrating such statistical information is non-trivial because of the need of a large number of test patterns, long test time, and high test coverage considering that complex errors may appear in million-to-billion weight parameters. In this paper, we leverage the concept of comer data that can significantly confuse the decision making of neural network model, as well as the training algorithm, to generate only a small set of test patterns that is tuned to be sensitive to different levels of error accumulation and accuracy loss. Experimental results show that our method can quickly and correctly report the fault status of a running accelerator, outperforming existing solutions in both detection efficiency and cost","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125870562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218577
Wenye Liu, Chip-Hong Chang, Fan Zhang, Xiaoxuan Lou
The convergence of edge computing and deep learning empowers endpoint hardwares or edge devices to perform inferences locally with the help of deep neural network (DNN) accelerator. This trend of edge intelligence invites new attack vectors, which are methodologically different from the well-known software oriented deep learning attacks like the input of adversarial examples. Current studies of threats on DNN hardware focus mainly on model parameters interpolation. Such kind of manipulation is not stealthy as it will leave non-erasable traces or create conspicuous output patterns. In this paper, we present and investigate an imperceptible misclassification attack on DNN hardware by introducing infrequent instantaneous glitches into the clock signal. Comparing with falsifying model parameters by permanent faults, corruption of targeted intermediate results of convolution layer(s) by disrupting associated computations intermittently leaves no trace. We demonstrated our attack on nine state-of-the-art ImageNet models running on Xilinx FPGA based deep learning accelerator. With no knowledge about the models, our attack can achieve over 98% misclassification on 8 out of 9 models with only 10% glitches launched into the computation clock cycles. Given the model details and inputs, all the test images applied to ResNet50 can be successfully misclassified with no more than 1.7% glitch injection.
{"title":"Imperceptible Misclassification Attack on Deep Learning Accelerator by Glitch Injection","authors":"Wenye Liu, Chip-Hong Chang, Fan Zhang, Xiaoxuan Lou","doi":"10.1109/DAC18072.2020.9218577","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218577","url":null,"abstract":"The convergence of edge computing and deep learning empowers endpoint hardwares or edge devices to perform inferences locally with the help of deep neural network (DNN) accelerator. This trend of edge intelligence invites new attack vectors, which are methodologically different from the well-known software oriented deep learning attacks like the input of adversarial examples. Current studies of threats on DNN hardware focus mainly on model parameters interpolation. Such kind of manipulation is not stealthy as it will leave non-erasable traces or create conspicuous output patterns. In this paper, we present and investigate an imperceptible misclassification attack on DNN hardware by introducing infrequent instantaneous glitches into the clock signal. Comparing with falsifying model parameters by permanent faults, corruption of targeted intermediate results of convolution layer(s) by disrupting associated computations intermittently leaves no trace. We demonstrated our attack on nine state-of-the-art ImageNet models running on Xilinx FPGA based deep learning accelerator. With no knowledge about the models, our attack can achieve over 98% misclassification on 8 out of 9 models with only 10% glitches launched into the computation clock cycles. Given the model details and inputs, all the test images applied to ResNet50 can be successfully misclassified with no more than 1.7% glitch injection.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124678866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218574
Rui Li, Heng Yu, Weixiong Jiang, Y. Ha
To obtain high reliability but avoiding the huge area overhead of traditional triple modular redundancy (TMR) methods in SRAM-based FPGAs, scrubbing based methods reconfigure the configuration memory of each task just before its execution. However, due to the limitation of the FPGA reconfiguration module that can only scrub one task at a time, parallel tasks may leave stringent timing requirements to schedule their scrubbing processes. Thus the scrubbing requests may be either delayed or omitted, leading to a less reliable system. To address this issue, we propose a novel optimal DVFS-based scrubbing algorithm to adjust the execution time of user tasks, thus significantly enhance the chance to schedule scrubbing successfully for parallel tasks. Besides, we develop an approximation algorithm to speed up its optimal version and develop a novel K-Means based method to reduce the memory usage of the algorithm. Compared to the state-of-the-art, experimental results show that our work achieves up to 36.11% improvement on system reliability with comparable algorithm execution time and memory consumption.
{"title":"DVFS-Based Scrubbing Scheduling for Reliability Maximization on Parallel Tasks in SRAM-based FPGAs","authors":"Rui Li, Heng Yu, Weixiong Jiang, Y. Ha","doi":"10.1109/DAC18072.2020.9218574","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218574","url":null,"abstract":"To obtain high reliability but avoiding the huge area overhead of traditional triple modular redundancy (TMR) methods in SRAM-based FPGAs, scrubbing based methods reconfigure the configuration memory of each task just before its execution. However, due to the limitation of the FPGA reconfiguration module that can only scrub one task at a time, parallel tasks may leave stringent timing requirements to schedule their scrubbing processes. Thus the scrubbing requests may be either delayed or omitted, leading to a less reliable system. To address this issue, we propose a novel optimal DVFS-based scrubbing algorithm to adjust the execution time of user tasks, thus significantly enhance the chance to schedule scrubbing successfully for parallel tasks. Besides, we develop an approximation algorithm to speed up its optimal version and develop a novel K-Means based method to reduce the memory usage of the algorithm. Compared to the state-of-the-art, experimental results show that our work achieves up to 36.11% improvement on system reliability with comparable algorithm execution time and memory consumption.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128543475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218653
Shaahin Angizi, N. Fahmi, W. Zhang, Deliang Fan
In this paper, for the first time, we propose a high-throughput and energy-efficient Processing-in-DRAM-accelerated genome assembler called PIM-Assembler based on an optimized and hardware-friendly genome assembly algorithm. PIM-Assembler can assemble large-scale DNA sequence dataset from all-pair overlaps. We first develop PIM-Assembler platform that harnesses DRAM as computational memory and transforms it to a fundamental processing unit for genome assembly. PIM-Assembler can perform efficient X(N)OR-based operations inside DRAM incurring low cost on top of commodity DRAM designs (∼5% of chip area). PIM-Assembler is then optimized through a correlated data partitioning and mapping methodology that allows local storage and processing of DNA short reads to fully exploit the genome assembly algorithm-level’s parallelism. The simulation results show that PIM-Assembler achieves on average 8.4× and 2.3 wise× higher throughput for performing bulk bit-XNOR-based comparison operations compared with CPU and recent processing-in-DRAM platforms, respectively. As for comparison/addition-extensive genome assembly application, it reduces the execution time and power by ∼5× and ∼ 7.5× compared to GPU.
在本文中,我们首次提出了一种基于优化和硬件友好的基因组组装算法的高通量和节能的processing - In - ram加速基因组组装器,称为PIM-Assembler。PIM-Assembler可以从全对重叠中组装大规模的DNA序列数据集。我们首先开发了PIM-Assembler平台,该平台利用DRAM作为计算存储器,并将其转换为基因组组装的基本处理单元。PIM-Assembler可以在DRAM内部执行高效的基于X(N) or的操作,在商品DRAM设计的基础上产生低成本(约占芯片面积的5%)。然后,PIM-Assembler通过相关的数据划分和映射方法进行优化,该方法允许DNA短读段的本地存储和处理,以充分利用基因组组装算法级别的并行性。仿真结果表明,PIM-Assembler在执行基于大容量位xnor的比较操作时,平均比CPU和最近的dram处理平台分别提高8.4倍和2.3倍的吞吐量。对于比较/添加广泛的基因组组装应用,与GPU相比,它的执行时间和功耗分别减少了约5倍和约7.5倍。
{"title":"PIM-Assembler: A Processing-in-Memory Platform for Genome Assembly","authors":"Shaahin Angizi, N. Fahmi, W. Zhang, Deliang Fan","doi":"10.1109/DAC18072.2020.9218653","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218653","url":null,"abstract":"In this paper, for the first time, we propose a high-throughput and energy-efficient Processing-in-DRAM-accelerated genome assembler called PIM-Assembler based on an optimized and hardware-friendly genome assembly algorithm. PIM-Assembler can assemble large-scale DNA sequence dataset from all-pair overlaps. We first develop PIM-Assembler platform that harnesses DRAM as computational memory and transforms it to a fundamental processing unit for genome assembly. PIM-Assembler can perform efficient X(N)OR-based operations inside DRAM incurring low cost on top of commodity DRAM designs (∼5% of chip area). PIM-Assembler is then optimized through a correlated data partitioning and mapping methodology that allows local storage and processing of DNA short reads to fully exploit the genome assembly algorithm-level’s parallelism. The simulation results show that PIM-Assembler achieves on average 8.4× and 2.3 wise× higher throughput for performing bulk bit-XNOR-based comparison operations compared with CPU and recent processing-in-DRAM platforms, respectively. As for comparison/addition-extensive genome assembly application, it reduces the execution time and power by ∼5× and ∼ 7.5× compared to GPU.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127402653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218555
S. Hillmich, I. Markov, R. Wille
Quantum computers promise significant speedups in solving problems intractable for conventional computers but, despite recent progress, remain limited in scaling and availability. Therefore, quantum software and hardware development heavily rely on simulation that runs on conventional computers. Most such approaches perform strong simulation in that they explicitly compute amplitudes of quantum states. However, such information is not directly observable from a physical quantum computer because quantum measurements produce random samples from probability distributions defined by those amplitudes. In this work, we focus on weak simulation that aims to produce outputs which are statistically indistinguishable from those of error-free quantum computers. We develop algorithms for weak simulation based on quantum state representation in terms of decision diagrams. We compare them to using state-vector arrays and binary search on prefix sums to perform sampling. Empirical validation shows, for the first time, that this enables mimicking of physical quantum computers of significant scale.
{"title":"Just Like the Real Thing: Fast Weak Simulation of Quantum Computation","authors":"S. Hillmich, I. Markov, R. Wille","doi":"10.1109/DAC18072.2020.9218555","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218555","url":null,"abstract":"Quantum computers promise significant speedups in solving problems intractable for conventional computers but, despite recent progress, remain limited in scaling and availability. Therefore, quantum software and hardware development heavily rely on simulation that runs on conventional computers. Most such approaches perform strong simulation in that they explicitly compute amplitudes of quantum states. However, such information is not directly observable from a physical quantum computer because quantum measurements produce random samples from probability distributions defined by those amplitudes. In this work, we focus on weak simulation that aims to produce outputs which are statistically indistinguishable from those of error-free quantum computers. We develop algorithms for weak simulation based on quantum state representation in terms of decision diagrams. We compare them to using state-vector arrays and binary search on prefix sums to perform sampling. Empirical validation shows, for the first time, that this enables mimicking of physical quantum computers of significant scale.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130219828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218539
Srikant Bharadwaj, Jieming Yin, Bradford M. Beckmann, T. Krishna
Recent advances in die-stacking and 2.5D chip integration technologies introduce in-package network heterogeneities that can complicate the interconnect design. Integrating chiplets over a silicon interposer offers new opportunities of optimizing interposer topologies. However, limited by the capability of existing network-on-chip (NoC) simulators, the full potential of the interposer-based NoCs has not been exploited. In this paper, we address the shortfalls of prior NoC designs and present a new family of chiplet topologies called Kite. Kite topologies better utilize the diverse networking and frequency domains existing in new interposer systems and outperform the prior chiplet topology proposals. Kite decreased synthetic traffic latency by 7% and improved the maximum throughput by 17% on average versus Double Butterfly and Butter Donut, two previous proposals developed using less accurate modeling.
{"title":"Kite: A Family of Heterogeneous Interposer Topologies Enabled via Accurate Interconnect Modeling","authors":"Srikant Bharadwaj, Jieming Yin, Bradford M. Beckmann, T. Krishna","doi":"10.1109/DAC18072.2020.9218539","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218539","url":null,"abstract":"Recent advances in die-stacking and 2.5D chip integration technologies introduce in-package network heterogeneities that can complicate the interconnect design. Integrating chiplets over a silicon interposer offers new opportunities of optimizing interposer topologies. However, limited by the capability of existing network-on-chip (NoC) simulators, the full potential of the interposer-based NoCs has not been exploited. In this paper, we address the shortfalls of prior NoC designs and present a new family of chiplet topologies called Kite. Kite topologies better utilize the diverse networking and frequency domains existing in new interposer systems and outperform the prior chiplet topology proposals. Kite decreased synthetic traffic latency by 7% and improved the maximum throughput by 17% on average versus Double Butterfly and Butter Donut, two previous proposals developed using less accurate modeling.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130458998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}