2014 International Conference on Field-Programmable Technology (FPT)最新文献

英文中文

Highly scalable, shared-memory, Monte-Carlo tree search based Blokus Duo Solver on FPGA 高度可扩展，共享内存，基于蒙特卡洛树搜索的Blokus Duo求解器在FPGA上

2014 International Conference on Field-Programmable Technology (FPT)

Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082823

Ehsan Qasemi, Amir Samadi, Mohammad H. Shadmehr, Bardia Azizian, Sajjad Mozaffari, Amir Shirian, B. Alizadeh

In this paper we present our hardware architecture on a highly scalable, shared-memory, Monte-Carlo Tree Search (MCTS) based Blokus-Duo solver. In the proposed architecture each MCTS solver module contains a centralized MCTS controller which can also be implemented using soft-cores with a true dual-port access to a shared memory called main memory, and multitude number of MCTS engines each containing several simulation cores. Consequently, this highly flexible architecture guaranties the optimized performance of the solver regardless of the actual FPGA platform used. Our design has been inspired from parallel MCTS algorithms and is potentially capable of obtaining maximum possible parallelism from MCTS algorithm. On the other hand, in our design we combine MCTS with pruning heuristics to increase both the memory and LE utilizations. The results show that our architecture can run up to 50MHz on DE2-115 platform, where each Simulation core requires 11K LEs and MCTS controller requires 10KLEs.

在本文中，我们提出了一个高度可扩展的、共享内存的、基于蒙特卡罗树搜索(MCTS)的Blokus-Duo求解器的硬件架构。在提出的架构中，每个MCTS求解器模块包含一个集中的MCTS控制器，该控制器也可以使用具有真正双端口访问称为主存的共享内存的软核来实现，以及多个MCTS引擎，每个引擎包含多个仿真核心。因此，无论实际使用的FPGA平台如何，这种高度灵活的架构都保证了求解器的最佳性能。我们的设计受到并行MCTS算法的启发，并有可能从MCTS算法中获得最大可能的并行性。另一方面，在我们的设计中，我们将MCTS与剪枝启发式结合起来，以增加内存和LE利用率。结果表明，我们的架构可以在DE2-115平台上运行高达50MHz，其中每个仿真核心需要11K的LEs, MCTS控制器需要10k的LEs。

引用次数: 4

Comparing performance, productivity and scalability of the TILT overlay processor to OpenCL HLS 比较TILT叠加处理器与OpenCL HLS的性能、生产率和可扩展性

2014 International Conference on Field-Programmable Technology (FPT)

Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082748

Rafat Rashid, J. Steffan, Vaughn Betz

High-Level-Synthesis (HLS) tools translate a software description of an application into custom FPGA logic, increasing designer productivity vs. Hardware Description Language (HDL) design flows. Overlays seek to further improve productivity by reducing application compile times and raising abstraction by enabling the designer to target a software-programmable substrate instead of the underlying FPGA. We compare the performance, development effort and scalability of two C-to-FPGA approaches: our TILT overlay processor and Altera's OpenCL HLS. Our application-customized TILT implementations of five data-parallel benchmarks have from 41 % to 80% of the throughput per unit of layout area achieved by our best OpenCL HLS designs. The time required for initial hardware compilation of these TILT designs and configuration of the target application onto the overlay is roughly comparable to the compile times of the OpenCL HLS designs: 28 and 103 minutes on average respectively. However subsequent reconfigurations due to changes in the application that do not require re-synthesis of the overlay are fast, taking 38 seconds on average. In contrast, OpenCL HLS applications require full recompilation after every code change. TILT also enables smaller, more area-efficient designs than OpenCL HLS when low to moderate throughput is sufficient. For high throughput, the larger spatially pipelined designs of OpenCL HLS are preferable.

高级综合(HLS)工具将应用程序的软件描述转换为定制的FPGA逻辑，与硬件描述语言(HDL)设计流程相比，提高了设计人员的工作效率。通过减少应用程序编译时间和提高抽象性，使设计人员能够针对软件可编程基板而不是底层FPGA, Overlays寻求进一步提高生产力。我们比较了两种C-to-FPGA方法的性能、开发工作量和可扩展性:我们的TILT覆盖处理器和Altera的OpenCL HLS。我们的应用程序定制的5个数据并行基准的TILT实现，每单位布局面积的吞吐量是我们最好的OpenCL HLS设计的41%到80%。这些TILT设计的初始硬件编译和目标应用程序在覆盖上的配置所需的时间与OpenCL HLS设计的编译时间大致相当:平均分别为28分钟和103分钟。然而，由于应用程序的变化而导致的后续重新配置(不需要重新合成覆盖层)速度很快，平均需要38秒。相比之下，OpenCL HLS应用程序在每次代码更改后都需要完全重新编译。当低到中等吞吐量就足够时，TILT还可以实现比OpenCL HLS更小、更高效的设计。对于高吞吐量，更大的空间流水线设计的OpenCL HLS是可取的。

{"title":"Comparing performance, productivity and scalability of the TILT overlay processor to OpenCL HLS","authors":"Rafat Rashid, J. Steffan, Vaughn Betz","doi":"10.1109/FPT.2014.7082748","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082748","url":null,"abstract":"High-Level-Synthesis (HLS) tools translate a software description of an application into custom FPGA logic, increasing designer productivity vs. Hardware Description Language (HDL) design flows. Overlays seek to further improve productivity by reducing application compile times and raising abstraction by enabling the designer to target a software-programmable substrate instead of the underlying FPGA. We compare the performance, development effort and scalability of two C-to-FPGA approaches: our TILT overlay processor and Altera's OpenCL HLS. Our application-customized TILT implementations of five data-parallel benchmarks have from 41 % to 80% of the throughput per unit of layout area achieved by our best OpenCL HLS designs. The time required for initial hardware compilation of these TILT designs and configuration of the target application onto the overlay is roughly comparable to the compile times of the OpenCL HLS designs: 28 and 103 minutes on average respectively. However subsequent reconfigurations due to changes in the application that do not require re-synthesis of the overlay are fast, taking 38 seconds on average. In contrast, OpenCL HLS applications require full recompilation after every code change. TILT also enables smaller, more area-efficient designs than OpenCL HLS when low to moderate throughput is sufficient. For high throughput, the larger spatially pipelined designs of OpenCL HLS are preferable.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"50 1","pages":"20-27"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91386502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 42

A novel three-dimensional FPGA architecture with high-speed serial communication links 一种具有高速串行通信链路的三维FPGA结构

2014 International Conference on Field-Programmable Technology (FPT)

Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082805

T. Kajiwara, Qian Zhao, M. Amagasaki, M. Iida, Morituro Kuga, T. Sueyoshi

Three-dimensional (3D) integrated circuit technology is expected to offer continual improvement to very-large-scale integration performance as the process of miniaturization approaches physical limits. However, because the through-silicon vias (TSVs) that are used to create interlayer vertical connections are much larger area than transistors, there is an inherent tradeoff between connectivity and small size. Field-programmable gate arrays (FPGAs) are particularly noted for requiring a high level of routing resources, which means that it is unrealistic to make the same number of connections vertically as horizontally. In previous research, we proposed a method for creating a two-layer compact 3D FPGA with face-down integration (the base FPGA). In this paper, we discuss stacking multiple base FPGAs by the face-up method and propose a method for achieving highspeed interlayer communications with TSV serial connections. The proposed architecture improves FPGA performance by using smaller TSVs. The evaluation results show that the proposed 3D FPGA can achieve a total area that is as low as 67% the equivalent two-dimensional FPGA.

随着小型化进程接近物理极限，三维集成电路技术有望为大规模集成性能提供持续改进。然而，由于用于创建层间垂直连接的硅通孔(tsv)的面积比晶体管大得多，因此在连接性和小尺寸之间存在固有的权衡。现场可编程门阵列(fpga)特别需要高水平的路由资源，这意味着在垂直方向上与水平方向上建立相同数量的连接是不现实的。在之前的研究中，我们提出了一种创建两层紧凑型3D FPGA的方法，该FPGA具有面向下集成(基础FPGA)。本文讨论了多基fpga的正面堆叠方法，并提出了一种利用TSV串行连接实现层间高速通信的方法。该架构通过使用更小的tsv来提高FPGA性能。评估结果表明，所提出的三维FPGA可实现的总面积低至等效二维FPGA的67%。

{"title":"A novel three-dimensional FPGA architecture with high-speed serial communication links","authors":"T. Kajiwara, Qian Zhao, M. Amagasaki, M. Iida, Morituro Kuga, T. Sueyoshi","doi":"10.1109/FPT.2014.7082805","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082805","url":null,"abstract":"Three-dimensional (3D) integrated circuit technology is expected to offer continual improvement to very-large-scale integration performance as the process of miniaturization approaches physical limits. However, because the through-silicon vias (TSVs) that are used to create interlayer vertical connections are much larger area than transistors, there is an inherent tradeoff between connectivity and small size. Field-programmable gate arrays (FPGAs) are particularly noted for requiring a high level of routing resources, which means that it is unrealistic to make the same number of connections vertically as horizontally. In previous research, we proposed a method for creating a two-layer compact 3D FPGA with face-down integration (the base FPGA). In this paper, we discuss stacking multiple base FPGAs by the face-up method and propose a method for achieving highspeed interlayer communications with TSV serial connections. The proposed architecture improves FPGA performance by using smaller TSVs. The evaluation results show that the proposed 3D FPGA can achieve a total area that is as low as 67% the equivalent two-dimensional FPGA.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"97 1","pages":"306-309"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88782408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Deep and narrow binary content-addressable memories using FPGA-based BRAMs 基于fpga的bram的深度和窄二进制内容可寻址存储器

2014 International Conference on Field-Programmable Technology (FPT)

Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082808

Ameer Abdelhadi, G. Lemieux

Binary Content Addressable Memories (BCAMs) are massively parallel search engines capable of searching the entire memory space in a single clock cycle. BCAMs are used in a wide range of applications, such as memory management, networks, data compression, DSP, and databases. Due to the increasing amount of processed information, modern BCAM applications demand a deep searching space. However, traditional BCAM approaches in FPGAs suffer from storage inefficiency. In this paper, a novel and efficient technique for constructing deep and narrow BCAMs out of standard SRAM blocks in FPGAs is proposed. This technique is most efficient for deep and narrow CAMs since the BRAM consumption is exponential to pattern width. Using Altera's Stratix V device, traditional methods achieve up to 64K-entry BCAM while the proposed technique achieves up to 4M entries. For the 64K-entry test-case, traditional methods consume 43 times more ALMs and achieves only one-third of the Fmax. A fully parameterized Verilog implementation is available1. This implementation has been extensively tested using Altera's tools.

二进制内容可寻址存储器(BCAMs)是一种大规模并行搜索引擎，能够在一个时钟周期内搜索整个内存空间。bcam广泛应用于内存管理、网络、数据压缩、DSP和数据库等领域。由于处理的信息量越来越大，现代BCAM应用需要更大的搜索空间。然而，fpga中传统的BCAM方法存在存储效率低下的问题。本文提出了一种利用fpga中标准SRAM块构建深、窄bcam的新颖高效技术。这种技术对于深和窄的凸轮是最有效的，因为BRAM消耗是模式宽度的指数。使用Altera公司的Stratix V设备，传统方法可以实现高达64k次的BCAM，而该技术可以实现高达4M次的BCAM。对于64k条目的测试用例，传统方法消耗43倍的alm，只达到Fmax的三分之一。一个完全参数化的Verilog实现是可用的。这个实现已经使用Altera的工具进行了广泛的测试。

引用次数: 9

Hardware architecture of bi-cubic convolution interpolation for real-time image scaling 实时图像缩放的双三次卷积插值硬件结构

2014 International Conference on Field-Programmable Technology (FPT)

Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082790

Gopinath Mahale, H. Mahale, Rajesh Babu Parimi, S. Nandy, S. Bhattacharya

This paper presents two hardware architectures of bi-cubic convolution interpolation termed Parallelized Row Column Interpolation Architecture (PRCIA) and Serialized Row Column Interpolation Architecture (SRCIA) for real-time image scaling. These architectures factor in the challenges of high computational complexity, redundant computations and repeated memory accesses, which were otherwise not explicitly addressed in existing architectures. Besides, the proposed architectures also employ parallel computations to improve the throughput for realtime applications. The proposed architectures have been emulated and tested on Virtex-6 FPGA. The emulated PRCIA and SRCIA are able to scale input grayscale images of dimensions up to 640 × 480 at 59 and 48 frames per second respectively with arbitrary scaling factors up to 4 in both dimensions.

本文提出了两种双三次卷积插值的硬件结构:并行行列插值结构(PRCIA)和串行行列插值结构(SRCIA)。这些体系结构考虑了高计算复杂性、冗余计算和重复内存访问的挑战，否则在现有体系结构中没有明确解决这些问题。此外，所提出的架构还采用并行计算来提高实时应用的吞吐量。所提出的架构已经在Virtex-6 FPGA上进行了仿真和测试。仿真的PRCIA和SRCIA能够分别以59帧/秒和48帧/秒的速度缩放尺寸为640 × 480的输入灰度图像，并且在两个维度上的任意缩放因子都高达4。

引用次数: 5

Memory security in reconfigurable computers: Combining formal verification with monitoring 可重构计算机中的存储器安全:将形式验证与监控相结合

2014 International Conference on Field-Programmable Technology (FPT)

Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082771

T. Wiersema, Stephanie Drzevitzky, M. Platzner

Ensuring memory access security is a challenge for reconfigurable systems with multiple cores. Previous work introduced access monitors attached to the memory subsystem to ensure that the cores adhere to pre-defined protocols when accessing memory. In this paper, we combine access monitors with a formal runtime verification technique known as proof-carrying hardware to guarantee memory security. We extend previous work on proof-carrying hardware by covering sequential circuits and demonstrate our approach with a prototype leveraging ReconOS/Zynq with an embedded ZUMA virtual FPGA overlay. Experiments show the feasibility of the approach and the capabilities of the prototype, which constitutes the first realization of proof-carrying hardware on real FPGAs. The area overheads for the virtual FPGA are measured as 2x-10x, depending on the resource type. The delay overhead is substantial with almost 100x, but this is an extremely pessimistic estimate that will be lowered once accurate timing analysis for FPGA overlays become available. Finally, reconfiguration time for the virtual FPGA is about one order of magnitude lower than for the native Zynq fabric.

确保内存访问安全性是多核可重构系统面临的一个挑战。以前的工作引入了连接到内存子系统的访问监视器，以确保内核在访问内存时遵守预定义的协议。在本文中，我们将访问监视器与称为携带证明硬件的正式运行时验证技术相结合，以保证内存安全性。我们通过覆盖顺序电路扩展了以前在承载证明硬件上的工作，并通过利用带有嵌入式ZUMA虚拟FPGA覆盖的ReconOS/Zynq原型演示了我们的方法。实验证明了该方法的可行性和样机的性能，构成了验证硬件在实际fpga上的首次实现。根据资源类型的不同，虚拟FPGA的面积开销为2 -10倍。延迟开销很大，几乎是100倍，但这是一个极其悲观的估计，一旦FPGA覆盖层的精确时序分析可用，延迟开销将会降低。最后，虚拟FPGA的重新配置时间比原生Zynq结构低一个数量级。

{"title":"Memory security in reconfigurable computers: Combining formal verification with monitoring","authors":"T. Wiersema, Stephanie Drzevitzky, M. Platzner","doi":"10.1109/FPT.2014.7082771","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082771","url":null,"abstract":"Ensuring memory access security is a challenge for reconfigurable systems with multiple cores. Previous work introduced access monitors attached to the memory subsystem to ensure that the cores adhere to pre-defined protocols when accessing memory. In this paper, we combine access monitors with a formal runtime verification technique known as proof-carrying hardware to guarantee memory security. We extend previous work on proof-carrying hardware by covering sequential circuits and demonstrate our approach with a prototype leveraging ReconOS/Zynq with an embedded ZUMA virtual FPGA overlay. Experiments show the feasibility of the approach and the capabilities of the prototype, which constitutes the first realization of proof-carrying hardware on real FPGAs. The area overheads for the virtual FPGA are measured as 2x-10x, depending on the resource type. The delay overhead is substantial with almost 100x, but this is an extremely pessimistic estimate that will be lowered once accurate timing analysis for FPGA overlays become available. Finally, reconfiguration time for the virtual FPGA is about one order of magnitude lower than for the native Zynq fabric.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"21 1","pages":"167-174"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75816079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

High performance relevance vector machine on HMPSoC HMPSoC上的高性能相关向量机

2014 International Conference on Field-Programmable Technology (FPT)

Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082812

Yongfu He, Shaojun Wang, Yu Peng, Y. Pang, Ning Ma, Jingyue Pang

Relevance Vector Machine (RVM) with the uncertainty expressing ability has spawned broad applications in Prognostic and Health Management (PHM). However computationally intensive intrinsic nature of RVM greatly limits its usage. This paper presents a software and hardware co-design approach based on HMPSoC technology, which efficiently exploited sequential and parallel nature of RVM. Multi-channel and pipelined hardware architecture for the acceleration of kernel formulation and intermediate values calculation is proposed. The hardware that wrapped with AXI-Stream interface is integrated into HMPSoC as an acceleration engine. We implement the design on an on-board PHM prototype platform with a Xilinx Zynq XC7Z020 AP SoC. The experiment results show 5.3× and 46.8× speed up in terms of the time cost than the RVM running on PC with a Xeon 5620 processor and ARM Cortex A9 processor. The energy consumption is reduced by 153.0× and 37.3×, respectively.

相关向量机(RVM)具有表达不确定性的能力，在预后和健康管理(PHM)中得到了广泛的应用。然而，RVM固有的计算密集型特性极大地限制了它的使用。本文提出了一种基于HMPSoC技术的软硬件协同设计方法，有效地利用了RVM的顺序和并行特性。提出了多通道和流水线的硬件结构，以加速核公式和中间值的计算。轴流接口封装的硬件作为加速引擎集成到HMPSoC中。我们在带有Xilinx Zynq XC7Z020 AP SoC的板载PHM原型平台上实现了该设计。实验结果表明，RVM在运行于Xeon 5620处理器和ARM Cortex A9处理器的PC机上时，运行速度分别提高5.3倍和46.8倍。能耗分别降低153.0倍和37.3倍。

引用次数: 8

Real-time 3D reconstruction for FPGAs: A case study for evaluating the performance, area, and programmability trade-offs of the Altera OpenCL SDK fpga的实时3D重建:评估Altera OpenCL SDK的性能，面积和可编程性权衡的案例研究

2014 International Conference on Field-Programmable Technology (FPT)

Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082810

Q. Gautier, A. Shearer, J. Matai, D. Richmond, Pingfan Meng, R. Kastner

Embedding real-time 3D reconstruction of a scene from a low-cost depth sensor can improve the development of technologies in the domains of augmented reality, mobile robotics, and more. However, current implementations require a computer with a powerful GPU, which limits its prospective applications with low-power requirements. To implement low-power 3D reconstruction we embedded two prominent algorithms of 3D reconstruction (Iterative Closest Point and Volumetric Integration) on an Altera Stratix V FPGA by using the OpenCL language and the Altera OpenCL SDK. In this paper, we present our application and evaluation of the Altera tool in terms of performance, area, and programmability trade-offs. We have verified that OpenCL can be a viable method for developing FPGA applications by modifying an open-source version of the Microsoft KinectFusion project to run partially on a FPGA.

从低成本的深度传感器嵌入实时3D场景重建可以改善增强现实，移动机器人等领域的技术发展。然而，目前的实现需要具有强大GPU的计算机，这限制了其低功耗要求的潜在应用。为了实现低功耗的3D重建，我们使用OpenCL语言和Altera OpenCL SDK在Altera Stratix V FPGA上嵌入了两种著名的3D重建算法(迭代最近点和体积积分)。在本文中，我们介绍了Altera工具在性能、面积和可编程性方面的应用和评估。通过修改Microsoft KinectFusion项目的开源版本，使其部分运行在FPGA上，我们已经验证了OpenCL可以成为开发FPGA应用程序的可行方法。

引用次数: 15

Hardware Trojan detection acceleration based on word-level statistical properties management 基于字级统计属性管理的硬件木马检测加速

2014 International Conference on Field-Programmable Technology (FPT)

Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082769

He Li, Qiang Liu

Hardware Trojan insertion has raised serious concerns to semiconductor industry and government agencies. Hardware Trojan is usually activated under rare conditions associated with low transition bits in a circuit. The damage includes circuit functional failure or important information leakage. Previous research on hardware Trojan detection is mainly based on side-channel analysis and Trojan activation. Long activation time is a major concern during the detection process. In this paper, we propose a novel approach for efficiently accelerating Trojan activation by increasing the transition activity of rare bits. In particular, the proposed approach increases the bit-level transition activity by controlling signal word-level statistical properties, such as changing the variance and autocorrelation of the signal. In addition, by analyzing the signal propagation statistical properties through various digital signal processing (DSP) operators such as adders and multipliers, the proposed approach can control the statistical properties of internal signals and then enhance the internal bit transition activity from the primary input of the circuit. The proposed approach is evaluated on several circuits. The results show that the transition activity of rare bits can be dramatically increased by up to 166.7 times and Trojan activation time can be reduced by up to 121 times.

硬件木马植入已经引起了半导体行业和政府部门的严重关注。硬件木马通常在电路中与低转换位相关的罕见条件下被激活。损坏包括电路功能故障或重要信息泄露。以往对硬件木马检测的研究主要基于侧信道分析和木马激活。在检测过程中，激活时间长是一个主要问题。在本文中，我们提出了一种通过增加稀有比特的跃迁活度来有效加速木马激活的新方法。特别是，该方法通过控制信号字级统计特性(如改变信号的方差和自相关)来增加比特级转移活动。此外，通过分析各种数字信号处理(DSP)运算符(如加法器和乘法器)对信号传播的统计特性，该方法可以控制内部信号的统计特性，从而从电路的一次输入增强内部比特转移活动性。在几个电路上对所提出的方法进行了评估。结果表明，稀有比特的跃迁活性可显著提高166.7倍，木马激活时间可缩短121倍。

{"title":"Hardware Trojan detection acceleration based on word-level statistical properties management","authors":"He Li, Qiang Liu","doi":"10.1109/FPT.2014.7082769","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082769","url":null,"abstract":"Hardware Trojan insertion has raised serious concerns to semiconductor industry and government agencies. Hardware Trojan is usually activated under rare conditions associated with low transition bits in a circuit. The damage includes circuit functional failure or important information leakage. Previous research on hardware Trojan detection is mainly based on side-channel analysis and Trojan activation. Long activation time is a major concern during the detection process. In this paper, we propose a novel approach for efficiently accelerating Trojan activation by increasing the transition activity of rare bits. In particular, the proposed approach increases the bit-level transition activity by controlling signal word-level statistical properties, such as changing the variance and autocorrelation of the signal. In addition, by analyzing the signal propagation statistical properties through various digital signal processing (DSP) operators such as adders and multipliers, the proposed approach can control the statistical properties of internal signals and then enhance the internal bit transition activity from the primary input of the circuit. The proposed approach is evaluated on several circuits. The results show that the transition activity of rare bits can be dramatically increased by up to 166.7 times and Trojan activation time can be reduced by up to 121 times.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"96 1","pages":"153-160"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77623290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

A high-performance low-power near-Vt RRAM-based FPGA 一种高性能低功耗近vt随机存储器FPGA

2014 International Conference on Field-Programmable Technology (FPT)

Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082777

Xifan Tang, P. Gaillardon, G. Micheli

The routing architecture, heavily using programmable switches, dominates the area, delay and power of Field Programmable Gate Arrays (FPGAs). Resistive Random Access Memories (RRAMs) enable high-performance routing architectures through the replacement of Static Random Access Memory (SRAM)-based programming switches. Exploiting the very low on-resistance state achievable by RRAMs, RRAM-based routing multiplexers can be used to significantly reduce the FPGA routing delays. In addition, RRAM-based routing architectures are less sensitive to supply voltage reductions and show promises in low-power FPGA designs. In this paper, we propose a near-Vt low-power RRAM-based FPGA where both delay and power reductions are achieved. Experimental results demonstrate that a near-Vi RRAM-based FPGA design leads to a 15% area shrink, a 10% delay reduction, and a 65% power improvement, compared to a conventional FPGA design for a given technology node. To achieve low on-resistance values, RRAMs typically require high programming currents. In other word, they need relatively large programming transistors, potentially resulting in area, delay and power inefficiencies. We also present a design methodology to properly size the programming transistors of RRAMs in order to further improve the area-efficiency. Experimental results show that a correct programming transistor sizing strategy contributes to further 18% area and 2% delay shrink, compared to the initial near-Vi RRAM-based FPGA.

大量使用可编程交换机的路由架构在现场可编程门阵列(fpga)的面积、延迟和功率方面占据主导地位。电阻式随机存取存储器(rram)通过替代基于静态随机存取存储器(SRAM)的编程开关实现高性能路由架构。利用rram可实现的极低导通电阻状态，基于rram的路由多路复用器可用于显着减少FPGA路由延迟。此外，基于ram的路由架构对电源电压降低不太敏感，并且在低功耗FPGA设计中表现出前景。在本文中，我们提出了一种接近vt的低功耗基于随机存储器的FPGA，可以实现延迟和功耗降低。实验结果表明，在给定的技术节点下，与传统FPGA设计相比，基于近vi ram的FPGA设计可使面积缩小15%，延迟降低10%，功耗提高65%。为了实现低导通阻值，rram通常需要高编程电流。换句话说，它们需要相对较大的编程晶体管，这可能导致面积、延迟和功率效率低下。我们还提出了一种设计方法，以适当的大小可编程晶体管的ram，以进一步提高面积效率。实验结果表明，与最初基于近vi ram的FPGA相比，正确的编程晶体管尺寸策略可以进一步减少18%的面积和2%的延迟。

{"title":"A high-performance low-power near-Vt RRAM-based FPGA","authors":"Xifan Tang, P. Gaillardon, G. Micheli","doi":"10.1109/FPT.2014.7082777","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082777","url":null,"abstract":"The routing architecture, heavily using programmable switches, dominates the area, delay and power of Field Programmable Gate Arrays (FPGAs). Resistive Random Access Memories (RRAMs) enable high-performance routing architectures through the replacement of Static Random Access Memory (SRAM)-based programming switches. Exploiting the very low on-resistance state achievable by RRAMs, RRAM-based routing multiplexers can be used to significantly reduce the FPGA routing delays. In addition, RRAM-based routing architectures are less sensitive to supply voltage reductions and show promises in low-power FPGA designs. In this paper, we propose a near-Vt low-power RRAM-based FPGA where both delay and power reductions are achieved. Experimental results demonstrate that a near-Vi RRAM-based FPGA design leads to a 15% area shrink, a 10% delay reduction, and a 65% power improvement, compared to a conventional FPGA design for a given technology node. To achieve low on-resistance values, RRAMs typically require high programming currents. In other word, they need relatively large programming transistors, potentially resulting in area, delay and power inefficiencies. We also present a design methodology to properly size the programming transistors of RRAMs in order to further improve the area-efficiency. Experimental results show that a correct programming transistor sizing strategy contributes to further 18% area and 2% delay shrink, compared to the initial near-Vi RRAM-based FPGA.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"48 1","pages":"207-214"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82217499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 48

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2014 International Conference on Field-Programmable Technology (FPT)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀