2020 57th ACM/IEEE Design Automation Conference (DAC)最新文献

英文中文

On the Security of Strong Memristor-based Physically Unclonable Functions 基于强忆阻器的物理不可克隆函数的安全性研究

2020 57th ACM/IEEE Design Automation Conference (DAC)

Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218491

Shaza Zeitouni, Emmanuel Stapf, H. Fereidooni, A. Sadeghi

PUFs are cost-effective security primitives that extract unique identifiers from integrated circuits. However, since their introduction, PUFs have been subject to modeling attacks based on machine learning. Recently, researchers explored emerging nano-electronic technologies, e.g., memristors, to construct hybrid-PUFs, which outperform CMOS-only PUFs and are claimed to be more resilient to modeling attacks. However, since such PUF designs are not open-source, the security claims remain dubious. In this paper, we reproduce a set of memristor-PUFs and extensively evaluate their unpredictability property. By leveraging state-of-the-art machine learning algorithms, we show that it is feasible to successfully model memristor-PUFs with high prediction rates of 98%. Even incorporating XOR gates, to further strengthen PUFs’ against modeling attacks, has a negligible effect.

puf是从集成电路中提取唯一标识符的经济有效的安全原语。然而，自从puf被引入以来，它一直受到基于机器学习的建模攻击。最近，研究人员探索了新兴的纳米电子技术，例如记忆电阻器，以构建混合puf，其性能优于仅cmos puf，并且据称更能抵御建模攻击。然而，由于这样的PUF设计不是开源的，安全性声明仍然值得怀疑。本文再现了一组忆阻器puf，并对其不可预测性进行了广泛的评价。通过利用最先进的机器学习算法，我们证明了以98%的高预测率成功建模忆阻器- puf是可行的。即使结合异或门，以进一步加强puf对建模攻击的影响，也可以忽略不计。

引用次数: 4

Timing-Accurate General-Purpose I/O for Multi- and Many-Core Systems: Scheduling and Hardware Support 多核和多核系统的定时精确通用I/O:调度和硬件支持

2020 57th ACM/IEEE Design Automation Conference (DAC)

Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218686

Shuai Zhao, Zhe Jiang, Xiaotian Dai, I. Bate, I. Habli, Wanli Chang

General-purpose I/O widely exists on multi- and many-core systems. For real-time applications, I/O operations are often required to be timing-predictable, i.e., bounded in the worst case, and timing-accurate, i.e., occur at (or near) an exact desired time instant. Unfortunately, both timing requirements of I/O operations are hard to achieve from the system level, especially for many-core architectures, due to various latency and contention factors presented in the path of instigating an I/O request. This paper considers a dedicated I/O co-processing unit, and proposes two scheduling methods, with the necessary hardware support implemented. It is the first work that guarantees timing predictability and maximises timing accuracy of I/O tasks in the multi-and many-core systems.

通用I/O广泛存在于多核和多核系统中。对于实时应用程序，I/O操作通常需要时间可预测，即在最坏的情况下是有限的，并且时间精确，即发生在(或接近)精确的期望时间瞬间。不幸的是，由于触发I/O请求的路径中存在各种延迟和争用因素，很难从系统级别实现I/O操作的这两个定时要求，特别是对于多核架构。本文考虑了一个专用的I/O协同处理单元，提出了两种调度方法，并实现了必要的硬件支持。它是保证多核和多核系统中I/O任务的定时可预测性和最大定时精度的第一项工作。

引用次数: 6

HITTSFL: Design of a Cost-Effective HIS-Insensitive TNU-Tolerant and SET-Filterable Latch for Safety-Critical Applications 为安全关键应用设计一种具有成本效益的his不敏感tnu容限和set滤波锁存器

2020 57th ACM/IEEE Design Automation Conference (DAC)

Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218704

Aibin Yan, Xiangfeng Feng, Xiaohu Zhao, Han Zhou, Jie Cui, Zuobin Ying, P. Girard, X. Wen

This paper proposes a cost-effective, high-impedance-state (HIS)-insensitive, triple-node-upset (TNU)-tolerant and single-event-transient (SET)-filterable latch, namely HITTSFL, to ensure high reliability with low-cost. The latch mainly comprises an output-level SET-filterable Schmitt-trigger and three inverters that make the values stored in three parallel single-node-upset (SNU)-recoverable dual-interlocked-storage-cells (DICEs) converge at a common node to tolerate any possible TNU. The latch does not use C-elements to be insensitive to the HIS. Simulation results demonstrate the TNU-tolerability and SET-filterability of the proposed HITTSFL latch. Moreover, due to the use of clock-gating technologies and fewer transistors, the proposed latch can reduce delay, power, and area by 76.65%, 6.16%, and 28.55%, respectively, compared with the state-of-the-art TNU hardened latch (TNUHL) that cannot filter SETs.

本文提出了一种经济、高阻抗状态(HIS)不敏感、三节点干扰(TNU)容忍和单事件瞬态(SET)滤波的锁存器，即HITTSFL，以低成本保证高可靠性。锁存器主要由一个输出级set滤波施密特触发器和三个逆变器组成，这些逆变器使存储在三个并联单节点干扰(SNU)可恢复双联锁存储单元(DICEs)中的值收敛在一个公共节点上，以容忍任何可能的TNU。闩锁不使用c元素对HIS不敏感。仿真结果表明，所提出的HITTSFL锁存器具有良好的容错性能和set滤波性能。此外，由于使用时钟门控技术和更少的晶体管，与目前最先进的TNU硬化锁存器(TNUHL)相比，所提出的锁存器可以分别减少76.65%，6.16%和28.55%的延迟，功耗和面积。

引用次数: 1

Late Breaking Results: Enabling Containerized Computing and Orchestration of ROS-based Robotic SW Applications on Cloud-Server-Edge Architectures 突破性成果:在云服务器边缘架构上实现基于ros的机器人软件应用的容器化计算和编排

2020 57th ACM/IEEE Design Automation Conference (DAC)

Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218659

Stefano Aldegheri, N. Bombieri, F. Fummi, Simone Girardi, R. Muradore, Nicola Piccinelli

We present a toolehain based on Docker and KubeEdge that enables containerization and orchestration of ROS-based robotic SW applications on heterogeneous and hierarchical HW architectures. The toolehain allows for verification of functional and real-time constraints through HW-in-the-loop simulation, and for automatic mapping exploration of the SW across Cloud-Server-Edge architectures. We present the results obtained for the deployment of a real case of study composed by an ORB-SLAM application combined to local/global planners with obstacle avoidance for a mobile robot navigation.

我们提出了一个基于Docker和KubeEdge的工具链，它可以在异构和分层的硬件架构上实现基于ros的机器人软件应用的容器化和编排。该工具链允许通过HW-in-the-loop模拟验证功能和实时约束，以及跨云-服务器边缘架构的软件自动映射探索。我们展示了一个真实案例研究的部署结果，该研究由ORB-SLAM应用程序与具有避障功能的移动机器人导航的局部/全局规划器相结合组成。

引用次数: 3

Monitoring the Health of Emerging Neural Network Accelerators with Cost-effective Concurrent Test 基于成本效益并行测试的新兴神经网络加速器健康监测

2020 57th ACM/IEEE Design Automation Conference (DAC)

Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218675

Qi Liu, Tao Liu, Zihao Liu, Wujie Wen, Chengmo Yang

ReRAM-based neural network accelerator is a promising solution to handle the memory-and computation-intensive deep learning workloads. However, it suffers from unique device errors. These errors can accumulate to massive levels during the run time and cause significant accuracy drop. It is crucial to obtain its fault status in real-time before any proper repair mechanism can be applied. However, calibrating such statistical information is non-trivial because of the need of a large number of test patterns, long test time, and high test coverage considering that complex errors may appear in million-to-billion weight parameters. In this paper, we leverage the concept of comer data that can significantly confuse the decision making of neural network model, as well as the training algorithm, to generate only a small set of test patterns that is tuned to be sensitive to different levels of error accumulation and accuracy loss. Experimental results show that our method can quickly and correctly report the fault status of a running accelerator, outperforming existing solutions in both detection efficiency and cost

基于reram的神经网络加速器是处理内存和计算密集型深度学习工作负载的一种很有前途的解决方案。然而，它遭受独特的设备错误。这些错误可能在运行期间累积到巨大的水平，并导致显著的准确性下降。在采用适当的修复机制之前，实时获取其故障状态是至关重要的。然而，校正这样的统计信息是非常重要的，因为需要大量的测试模式、较长的测试时间和较高的测试覆盖率，考虑到复杂的错误可能出现在百万到十亿的权重参数中。在本文中，我们利用角数据的概念，这可能会严重混淆神经网络模型的决策，以及训练算法，只生成一组测试模式，这些模式被调整为对不同级别的错误积累和准确性损失敏感。实验结果表明，该方法能够快速准确地检测出运行中的加速器的故障状态，在检测效率和成本上都优于现有的检测方法

{"title":"Monitoring the Health of Emerging Neural Network Accelerators with Cost-effective Concurrent Test","authors":"Qi Liu, Tao Liu, Zihao Liu, Wujie Wen, Chengmo Yang","doi":"10.1109/DAC18072.2020.9218675","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218675","url":null,"abstract":"ReRAM-based neural network accelerator is a promising solution to handle the memory-and computation-intensive deep learning workloads. However, it suffers from unique device errors. These errors can accumulate to massive levels during the run time and cause significant accuracy drop. It is crucial to obtain its fault status in real-time before any proper repair mechanism can be applied. However, calibrating such statistical information is non-trivial because of the need of a large number of test patterns, long test time, and high test coverage considering that complex errors may appear in million-to-billion weight parameters. In this paper, we leverage the concept of comer data that can significantly confuse the decision making of neural network model, as well as the training algorithm, to generate only a small set of test patterns that is tuned to be sensitive to different levels of error accumulation and accuracy loss. Experimental results show that our method can quickly and correctly report the fault status of a running accelerator, outperforming existing solutions in both detection efficiency and cost","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125870562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Imperceptible Misclassification Attack on Deep Learning Accelerator by Glitch Injection

2020 57th ACM/IEEE Design Automation Conference (DAC)

Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218577

Wenye Liu, Chip-Hong Chang, Fan Zhang, Xiaoxuan Lou

The convergence of edge computing and deep learning empowers endpoint hardwares or edge devices to perform inferences locally with the help of deep neural network (DNN) accelerator. This trend of edge intelligence invites new attack vectors, which are methodologically different from the well-known software oriented deep learning attacks like the input of adversarial examples. Current studies of threats on DNN hardware focus mainly on model parameters interpolation. Such kind of manipulation is not stealthy as it will leave non-erasable traces or create conspicuous output patterns. In this paper, we present and investigate an imperceptible misclassification attack on DNN hardware by introducing infrequent instantaneous glitches into the clock signal. Comparing with falsifying model parameters by permanent faults, corruption of targeted intermediate results of convolution layer(s) by disrupting associated computations intermittently leaves no trace. We demonstrated our attack on nine state-of-the-art ImageNet models running on Xilinx FPGA based deep learning accelerator. With no knowledge about the models, our attack can achieve over 98% misclassification on 8 out of 9 models with only 10% glitches launched into the computation clock cycles. Given the model details and inputs, all the test images applied to ResNet50 can be successfully misclassified with no more than 1.7% glitch injection.

边缘计算和深度学习的融合使端点硬件或边缘设备能够在深度神经网络(DNN)加速器的帮助下在本地执行推理。这种边缘智能的趋势引发了新的攻击向量，这些攻击向量在方法上不同于众所周知的面向软件的深度学习攻击，比如对抗性示例的输入。目前对深度神经网络硬件威胁的研究主要集中在模型参数插值方面。这种操作不是隐形的，因为它会留下不可擦除的痕迹或产生明显的输出模式。在本文中，我们提出并研究了通过在时钟信号中引入罕见的瞬时故障来对DNN硬件进行难以察觉的错误分类攻击。与用永久故障伪造模型参数相比，通过间歇性地破坏相关计算来破坏卷积层的目标中间结果不会留下痕迹。在不了解模型的情况下，我们的攻击可以在9个模型中的8个模型上实现超过98%的错误分类，只有10%的故障启动到计算时钟周期中。在给定模型细节和输入的情况下，应用于ResNet50的所有测试图像都可以成功误分类，误差不超过1.7%。

{"title":"Imperceptible Misclassification Attack on Deep Learning Accelerator by Glitch Injection","authors":"Wenye Liu, Chip-Hong Chang, Fan Zhang, Xiaoxuan Lou","doi":"10.1109/DAC18072.2020.9218577","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218577","url":null,"abstract":"The convergence of edge computing and deep learning empowers endpoint hardwares or edge devices to perform inferences locally with the help of deep neural network (DNN) accelerator. This trend of edge intelligence invites new attack vectors, which are methodologically different from the well-known software oriented deep learning attacks like the input of adversarial examples. Current studies of threats on DNN hardware focus mainly on model parameters interpolation. Such kind of manipulation is not stealthy as it will leave non-erasable traces or create conspicuous output patterns. In this paper, we present and investigate an imperceptible misclassification attack on DNN hardware by introducing infrequent instantaneous glitches into the clock signal. Comparing with falsifying model parameters by permanent faults, corruption of targeted intermediate results of convolution layer(s) by disrupting associated computations intermittently leaves no trace. We demonstrated our attack on nine state-of-the-art ImageNet models running on Xilinx FPGA based deep learning accelerator. With no knowledge about the models, our attack can achieve over 98% misclassification on 8 out of 9 models with only 10% glitches launched into the computation clock cycles. Given the model details and inputs, all the test images applied to ResNet50 can be successfully misclassified with no more than 1.7% glitch injection.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124678866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

DVFS-Based Scrubbing Scheduling for Reliability Maximization on Parallel Tasks in SRAM-based FPGAs 基于dvfs的sram fpga并行任务可靠性最大化擦洗调度

2020 57th ACM/IEEE Design Automation Conference (DAC)

Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218574

Rui Li, Heng Yu, Weixiong Jiang, Y. Ha

To obtain high reliability but avoiding the huge area overhead of traditional triple modular redundancy (TMR) methods in SRAM-based FPGAs, scrubbing based methods reconfigure the configuration memory of each task just before its execution. However, due to the limitation of the FPGA reconfiguration module that can only scrub one task at a time, parallel tasks may leave stringent timing requirements to schedule their scrubbing processes. Thus the scrubbing requests may be either delayed or omitted, leading to a less reliable system. To address this issue, we propose a novel optimal DVFS-based scrubbing algorithm to adjust the execution time of user tasks, thus significantly enhance the chance to schedule scrubbing successfully for parallel tasks. Besides, we develop an approximation algorithm to speed up its optimal version and develop a novel K-Means based method to reduce the memory usage of the algorithm. Compared to the state-of-the-art, experimental results show that our work achieves up to 36.11% improvement on system reliability with comparable algorithm execution time and memory consumption.

为了获得高可靠性，同时避免基于sram的fpga中传统三模冗余(TMR)方法的巨大面积开销，基于擦洗的方法在每个任务执行之前重新配置配置内存。然而，由于FPGA重构模块一次只能清理一个任务的限制，并行任务可能会留下严格的时序要求来安排它们的清理过程。因此，清洗请求可能会延迟或忽略，从而导致系统可靠性降低。为了解决这个问题，我们提出了一种新的基于dvfs的最优清洗算法来调整用户任务的执行时间，从而大大提高了并行任务调度清洗成功的机会。此外，我们开发了一种近似算法来加速其最优版本，并开发了一种新的基于K-Means的方法来减少算法的内存使用。实验结果表明，在算法执行时间和内存消耗相当的情况下，我们的工作在系统可靠性方面提高了36.11%。

{"title":"DVFS-Based Scrubbing Scheduling for Reliability Maximization on Parallel Tasks in SRAM-based FPGAs","authors":"Rui Li, Heng Yu, Weixiong Jiang, Y. Ha","doi":"10.1109/DAC18072.2020.9218574","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218574","url":null,"abstract":"To obtain high reliability but avoiding the huge area overhead of traditional triple modular redundancy (TMR) methods in SRAM-based FPGAs, scrubbing based methods reconfigure the configuration memory of each task just before its execution. However, due to the limitation of the FPGA reconfiguration module that can only scrub one task at a time, parallel tasks may leave stringent timing requirements to schedule their scrubbing processes. Thus the scrubbing requests may be either delayed or omitted, leading to a less reliable system. To address this issue, we propose a novel optimal DVFS-based scrubbing algorithm to adjust the execution time of user tasks, thus significantly enhance the chance to schedule scrubbing successfully for parallel tasks. Besides, we develop an approximation algorithm to speed up its optimal version and develop a novel K-Means based method to reduce the memory usage of the algorithm. Compared to the state-of-the-art, experimental results show that our work achieves up to 36.11% improvement on system reliability with comparable algorithm execution time and memory consumption.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128543475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

PIM-Assembler: A Processing-in-Memory Platform for Genome Assembly PIM-Assembler:基因组组装的内存处理平台

2020 57th ACM/IEEE Design Automation Conference (DAC)

Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218653

Shaahin Angizi, N. Fahmi, W. Zhang, Deliang Fan

In this paper, for the first time, we propose a high-throughput and energy-efficient Processing-in-DRAM-accelerated genome assembler called PIM-Assembler based on an optimized and hardware-friendly genome assembly algorithm. PIM-Assembler can assemble large-scale DNA sequence dataset from all-pair overlaps. We first develop PIM-Assembler platform that harnesses DRAM as computational memory and transforms it to a fundamental processing unit for genome assembly. PIM-Assembler can perform efficient X(N)OR-based operations inside DRAM incurring low cost on top of commodity DRAM designs (∼5% of chip area). PIM-Assembler is then optimized through a correlated data partitioning and mapping methodology that allows local storage and processing of DNA short reads to fully exploit the genome assembly algorithm-level’s parallelism. The simulation results show that PIM-Assembler achieves on average 8.4× and 2.3 wise× higher throughput for performing bulk bit-XNOR-based comparison operations compared with CPU and recent processing-in-DRAM platforms, respectively. As for comparison/addition-extensive genome assembly application, it reduces the execution time and power by ∼5× and ∼ 7.5× compared to GPU.

在本文中，我们首次提出了一种基于优化和硬件友好的基因组组装算法的高通量和节能的processing - In - ram加速基因组组装器，称为PIM-Assembler。PIM-Assembler可以从全对重叠中组装大规模的DNA序列数据集。我们首先开发了PIM-Assembler平台，该平台利用DRAM作为计算存储器，并将其转换为基因组组装的基本处理单元。PIM-Assembler可以在DRAM内部执行高效的基于X(N) or的操作，在商品DRAM设计的基础上产生低成本(约占芯片面积的5%)。然后，PIM-Assembler通过相关的数据划分和映射方法进行优化，该方法允许DNA短读段的本地存储和处理，以充分利用基因组组装算法级别的并行性。仿真结果表明，PIM-Assembler在执行基于大容量位xnor的比较操作时，平均比CPU和最近的dram处理平台分别提高8.4倍和2.3倍的吞吐量。对于比较/添加广泛的基因组组装应用，与GPU相比，它的执行时间和功耗分别减少了约5倍和约7.5倍。

{"title":"PIM-Assembler: A Processing-in-Memory Platform for Genome Assembly","authors":"Shaahin Angizi, N. Fahmi, W. Zhang, Deliang Fan","doi":"10.1109/DAC18072.2020.9218653","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218653","url":null,"abstract":"In this paper, for the first time, we propose a high-throughput and energy-efficient Processing-in-DRAM-accelerated genome assembler called PIM-Assembler based on an optimized and hardware-friendly genome assembly algorithm. PIM-Assembler can assemble large-scale DNA sequence dataset from all-pair overlaps. We first develop PIM-Assembler platform that harnesses DRAM as computational memory and transforms it to a fundamental processing unit for genome assembly. PIM-Assembler can perform efficient X(N)OR-based operations inside DRAM incurring low cost on top of commodity DRAM designs (∼5% of chip area). PIM-Assembler is then optimized through a correlated data partitioning and mapping methodology that allows local storage and processing of DNA short reads to fully exploit the genome assembly algorithm-level’s parallelism. The simulation results show that PIM-Assembler achieves on average 8.4× and 2.3 wise× higher throughput for performing bulk bit-XNOR-based comparison operations compared with CPU and recent processing-in-DRAM platforms, respectively. As for comparison/addition-extensive genome assembly application, it reduces the execution time and power by ∼5× and ∼ 7.5× compared to GPU.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127402653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Just Like the Real Thing: Fast Weak Simulation of Quantum Computation 就像真实的东西:量子计算的快速弱模拟

2020 57th ACM/IEEE Design Automation Conference (DAC)

Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218555

S. Hillmich, I. Markov, R. Wille

Quantum computers promise significant speedups in solving problems intractable for conventional computers but, despite recent progress, remain limited in scaling and availability. Therefore, quantum software and hardware development heavily rely on simulation that runs on conventional computers. Most such approaches perform strong simulation in that they explicitly compute amplitudes of quantum states. However, such information is not directly observable from a physical quantum computer because quantum measurements produce random samples from probability distributions defined by those amplitudes. In this work, we focus on weak simulation that aims to produce outputs which are statistically indistinguishable from those of error-free quantum computers. We develop algorithms for weak simulation based on quantum state representation in terms of decision diagrams. We compare them to using state-vector arrays and binary search on prefix sums to perform sampling. Empirical validation shows, for the first time, that this enables mimicking of physical quantum computers of significant scale.

量子计算机有望显著加快解决传统计算机难以解决的问题，但尽管最近取得了进展，但在可扩展性和可用性方面仍然有限。因此，量子软件和硬件的发展严重依赖于传统计算机上运行的模拟。大多数这样的方法执行强模拟，因为它们明确地计算量子态的振幅。然而，这些信息不能从物理量子计算机上直接观察到，因为量子测量从由这些振幅定义的概率分布中产生随机样本。在这项工作中，我们专注于弱模拟，旨在产生与无错误量子计算机统计上无法区分的输出。我们开发了基于决策图的量子态表示的弱模拟算法。我们将它们与使用状态向量数组和对前缀和进行二分搜索来执行抽样进行比较。经验验证表明，这是第一次实现大规模物理量子计算机的模拟。

引用次数: 16

Kite: A Family of Heterogeneous Interposer Topologies Enabled via Accurate Interconnect Modeling 风筝:通过精确互连建模实现的异构中间层拓扑

2020 57th ACM/IEEE Design Automation Conference (DAC)

Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218539

Srikant Bharadwaj, Jieming Yin, Bradford M. Beckmann, T. Krishna

Recent advances in die-stacking and 2.5D chip integration technologies introduce in-package network heterogeneities that can complicate the interconnect design. Integrating chiplets over a silicon interposer offers new opportunities of optimizing interposer topologies. However, limited by the capability of existing network-on-chip (NoC) simulators, the full potential of the interposer-based NoCs has not been exploited. In this paper, we address the shortfalls of prior NoC designs and present a new family of chiplet topologies called Kite. Kite topologies better utilize the diverse networking and frequency domains existing in new interposer systems and outperform the prior chiplet topology proposals. Kite decreased synthetic traffic latency by 7% and improved the maximum throughput by 17% on average versus Double Butterfly and Butter Donut, two previous proposals developed using less accurate modeling.

最近在模堆和2.5D芯片集成技术方面的进展引入了封装内网络异构性，这可能使互连设计复杂化。在硅中间层上集成小芯片为优化中间层拓扑提供了新的机会。然而，受限于现有的片上网络(NoC)模拟器的能力，基于中间体的NoC的全部潜力尚未得到开发。在本文中，我们解决了先前NoC设计的不足，并提出了一种新的称为Kite的芯片拓扑系列。风筝拓扑更好地利用了新的中介系统中存在的各种网络和频域，并且优于先前的小片拓扑方案。与Double Butterfly和Butter Donut相比，Kite将合成流量延迟降低了7%，最大吞吐量平均提高了17%，而Double Butterfly和Butter Donut是之前两种使用不太精确建模的方案。

引用次数: 26

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2020 57th ACM/IEEE Design Automation Conference (DAC)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀