首页 > 最新文献

2014 International Conference on Field-Programmable Technology (FPT)最新文献

英文 中文
Doing FPGA in a former software company 在一家前软件公司做FPGA
Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082744
Feng-Hsiung Hsu
Microsoft has gone through massive changes in the last few years. First, it was the dominant software company. Then, it became a “Devices and Services” company, and now it is “Mobile First, Cloud First”. Of course, deep down in the bones, it is still a software company. In this talk, I will give a personal account on how FPGA acceleration gradually gained traction inside Microsoft, difficulties and lessons learned in getting acceptance, FPGA's apparently imminent deployment inside Microsoft data centers, and finally what may be needed in FPGA programming software tool developments for wider acceptance inside a company like Microsoft.
微软在过去几年中经历了巨大的变化。首先,它是占主导地位的软件公司。然后,它变成了一家“设备和服务”公司,现在它是“移动第一,云第一”。当然,在骨子里,它仍然是一家软件公司。在这次演讲中,我将给出一个关于FPGA加速如何在微软内部逐渐获得吸引力的个人帐户,在获得接受的困难和经验教训,FPGA在微软数据中心内明显即将部署,最后可能需要在FPGA编程软件工具开发中获得更广泛的接受在像微软这样的公司。
{"title":"Doing FPGA in a former software company","authors":"Feng-Hsiung Hsu","doi":"10.1109/FPT.2014.7082744","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082744","url":null,"abstract":"Microsoft has gone through massive changes in the last few years. First, it was the dominant software company. Then, it became a “Devices and Services” company, and now it is “Mobile First, Cloud First”. Of course, deep down in the bones, it is still a software company. In this talk, I will give a personal account on how FPGA acceleration gradually gained traction inside Microsoft, difficulties and lessons learned in getting acceptance, FPGA's apparently imminent deployment inside Microsoft data centers, and finally what may be needed in FPGA programming software tool developments for wider acceptance inside a company like Microsoft.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"11 1","pages":"3"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89376697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Power supply noise aware evaluation framework for side channel attacks and countermeasures 侧信道攻击与对策的电源噪声感知评估框架
Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082770
Jianlei Yang, Chenguang Wang, Yici Cai, Qiang Zhou
Side Channel Attack (SCA) aims to extract the secret information from cryptography chips by analyzing the leakage of physical parameters. Power analysis based SCA is a popular approach to obtain secret keys by monitoring the power consumption of cryptography chips. However, most SCA evaluation methods are performed on FPGA platforms while many parasitic physical effects cannot be revealed before the cryptography chips are taped out. Roughly ignoring these effects will significantly increase the attack difficulties due to the corresponding measurement noise. Power supply noise has been observed to be critical for power analysis based SCA. This paper demonstrates a power supply noise aware evaluation framework for practical side channel attack from cryptography system design to physical design. On-chip power delivery network is implemented among physical design stage. Consequently the supply noise of power network can be explored according to the post-layout implementation. Additionally, the countermeasures of cryptography chips could be enhanced by on-chip decapacitors placement due to its influences on the characteristics of power delivery network.
侧信道攻击(SCA)的目的是通过分析物理参数的泄露,从加密芯片中提取秘密信息。基于功率分析的SCA是一种通过监控加密芯片的功耗来获取密钥的流行方法。然而,大多数SCA评估方法都是在FPGA平台上执行的,而许多寄生物理效应在加密芯片被封装之前无法被揭示。由于相应的测量噪声,粗略地忽略这些影响将大大增加攻击难度。电源噪声被认为是基于SCA的功率分析的关键因素。从密码系统设计到物理设计,阐述了一种实用侧信道攻击的电源噪声感知评估框架。在物理设计阶段实现片上供电网络。从而可以根据布置图后的实施情况,对电网供电噪声进行探讨。此外,由于芯片上的去电容对输电网特性的影响,可以通过在芯片上放置去电容来增强加密芯片的对抗能力。
{"title":"Power supply noise aware evaluation framework for side channel attacks and countermeasures","authors":"Jianlei Yang, Chenguang Wang, Yici Cai, Qiang Zhou","doi":"10.1109/FPT.2014.7082770","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082770","url":null,"abstract":"Side Channel Attack (SCA) aims to extract the secret information from cryptography chips by analyzing the leakage of physical parameters. Power analysis based SCA is a popular approach to obtain secret keys by monitoring the power consumption of cryptography chips. However, most SCA evaluation methods are performed on FPGA platforms while many parasitic physical effects cannot be revealed before the cryptography chips are taped out. Roughly ignoring these effects will significantly increase the attack difficulties due to the corresponding measurement noise. Power supply noise has been observed to be critical for power analysis based SCA. This paper demonstrates a power supply noise aware evaluation framework for practical side channel attack from cryptography system design to physical design. On-chip power delivery network is implemented among physical design stage. Consequently the supply noise of power network can be explored according to the post-layout implementation. Additionally, the countermeasures of cryptography chips could be enhanced by on-chip decapacitors placement due to its influences on the characteristics of power delivery network.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"56 1","pages":"161-166"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87331421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Achieving higher performance of memcached by caching at network interface 通过网络接口缓存实现memcached的更高性能
Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082799
E. Fukuda, Hiroaki Inoue, Takashi Takenaka, Dahoo Kim, Tsunaki Sadahisa, T. Asai, M. Motomura
As the volume of data that web services handle is becoming larger, many web service providers are utilizing memcached, an in-memory key-value store to improve their web server's performance. While memcached usually runs on a server with a high performance processor, various hardware platforms has been evaluated for running memcached in order to achieve higher performance. Recently, several works that use FPGAs have successfully achieved higher performance than Xeon. These works, however, struggles to utilize a large memory with FPGAs. In this paper, we propose a system that enables us to overcome this problem and enhances memcached by caching a part of software memcached's commands and data to the network interface card equipped with an FPGA and a DRAM. Our evaluation showed that the NIC cache has less than 30% of hit rate for workload with Latest key selection distribution, and 30% to 60% for Zipf distribution workloads.
随着web服务处理的数据量越来越大,许多web服务提供商正在利用memcached(一种内存中的键值存储)来提高其web服务器的性能。虽然memcached通常运行在具有高性能处理器的服务器上,但是为了实现更高的性能,已经对运行memcached的各种硬件平台进行了评估。最近,一些使用fpga的作品已经成功地实现了比Xeon更高的性能。然而,这些作品很难利用fpga的大内存。在本文中,我们提出了一个系统,使我们能够克服这一问题,并通过将memcached软件的部分命令和数据缓存到配备FPGA和DRAM的网络接口卡来增强memcached。我们的评估表明,NIC缓存在使用最新密钥选择分布的工作负载中命中率低于30%,在Zipf分布工作负载中命中率低于30%至60%。
{"title":"Achieving higher performance of memcached by caching at network interface","authors":"E. Fukuda, Hiroaki Inoue, Takashi Takenaka, Dahoo Kim, Tsunaki Sadahisa, T. Asai, M. Motomura","doi":"10.1109/FPT.2014.7082799","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082799","url":null,"abstract":"As the volume of data that web services handle is becoming larger, many web service providers are utilizing memcached, an in-memory key-value store to improve their web server's performance. While memcached usually runs on a server with a high performance processor, various hardware platforms has been evaluated for running memcached in order to achieve higher performance. Recently, several works that use FPGAs have successfully achieved higher performance than Xeon. These works, however, struggles to utilize a large memory with FPGAs. In this paper, we propose a system that enables us to overcome this problem and enhances memcached by caching a part of software memcached's commands and data to the network interface card equipped with an FPGA and a DRAM. Our evaluation showed that the NIC cache has less than 30% of hit rate for workload with Latest key selection distribution, and 30% to 60% for Zipf distribution workloads.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"59 1","pages":"288-289"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87128879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Area efficient floating-point adder and multiplier with IEEE-754 compatible semantics 具有IEEE-754兼容语义的面积高效浮点加法器和乘法器
Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082765
A. Ehliar
In this paper we describe an open source floating-point adder and multiplier implemented using a 36-bit custom number format based on radix-16 and optimized for the 7-series FPGAs from Xilinx. Although this number format is not identical to the single-precision IEEE-754 format, the floatingpoint operators are designed in such a way that the numerical results for a given operation will be identical to the result from an IEEE-754 compliant operator with support for round-to-nearest even, NaNs and Infs, and subnormal numbers. The drawback of this number format is that the rounding step is more involved than in a regular, radix-2 based operator. On the other hand, the use of a high radix means that the area cost associated with normalization and denormalization can be reduced, leading to a net area advantage for the custom number format, under the assumption that support for subnormal numbers is required. The area of the floating-point adder in a Kintex-7 FPGA is 261 slice LUTs and the area of the floating-point multiplier is 235 slice LUTs and 2 DSP48E blocks. The adder can operate at 319 MHz and the multiplier can operate at a frequency of 305 MHz.
在本文中,我们描述了一个开源的浮点加法器和乘法器,使用基于基数16的36位自定义数字格式实现,并针对Xilinx的7系列fpga进行了优化。虽然这种数字格式与单精度IEEE-754格式不相同,但浮点运算符的设计方式是,给定操作的数值结果将与符合IEEE-754的运算符的结果相同,并支持舍入到最接近的偶数、nan和if以及次正规数。这种数字格式的缺点是,与基于基数2的常规运算符相比,舍入步骤更复杂。另一方面,使用高基数意味着可以减少与规范化和非规范化相关的面积成本,从而在需要支持次正规数的假设下,为自定义数字格式带来净面积优势。Kintex-7 FPGA中浮点加法器的面积为261片lut,浮点乘法器的面积为235片lut和2个DSP48E块。加法器的工作频率为319 MHz,乘法器的工作频率为305 MHz。
{"title":"Area efficient floating-point adder and multiplier with IEEE-754 compatible semantics","authors":"A. Ehliar","doi":"10.1109/FPT.2014.7082765","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082765","url":null,"abstract":"In this paper we describe an open source floating-point adder and multiplier implemented using a 36-bit custom number format based on radix-16 and optimized for the 7-series FPGAs from Xilinx. Although this number format is not identical to the single-precision IEEE-754 format, the floatingpoint operators are designed in such a way that the numerical results for a given operation will be identical to the result from an IEEE-754 compliant operator with support for round-to-nearest even, NaNs and Infs, and subnormal numbers. The drawback of this number format is that the rounding step is more involved than in a regular, radix-2 based operator. On the other hand, the use of a high radix means that the area cost associated with normalization and denormalization can be reduced, leading to a net area advantage for the custom number format, under the assumption that support for subnormal numbers is required. The area of the floating-point adder in a Kintex-7 FPGA is 261 slice LUTs and the area of the floating-point multiplier is 235 slice LUTs and 2 DSP48E blocks. The adder can operate at 319 MHz and the multiplier can operate at a frequency of 305 MHz.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"23 1","pages":"131-138"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78659877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Parallel resampling for particle filters on FPGAs fpga上粒子滤波器的并行重采样
Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082775
Shuanglong Liu, Grigorios Mingas, C. Bouganis
Particle filters (PFs) are a set of algorithms that implement recursive Bayesian filtering, which represent the posterior distribution by a set of weighted samples. Resampling is a fundamental operation in PF algorithms. It consists of taking a population of samples and reconstructing it based on the weights attached to each sample, favouring the samples with large weights. However, resampling is computationally intensive when the number of samples is large and, most importantly, it is not inherently parallelizable like the other steps of the particle filter. Parallel computing devices such as Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs) have been proposed to accelerate resampling. In this paper, we propose novel parallel architectures that map four state-of-the-art resampling algorithms (systematic, residual systematic, Metropolis and Rejection resampling) to a FPGA. FPGA-specific optimisations are introduced to further optimize the performance of the above systems. The proposed architectures are implemented in a Virtex-6 LX240T FPGA device with half-utilization of logic resources. Compared to the respective state-of-the-art implementations on an NVIDIA K20 GPU, the achieved speedups are in the range of 1.7x-49x.
粒子滤波(PFs)是一组实现递归贝叶斯滤波的算法,它通过一组加权样本来表示后验分布。重采样是PF算法中的一个基本操作。它包括取一组样本,并根据每个样本的权重对其进行重构,偏爱权重较大的样本。然而,当样本数量很大时,重采样是计算密集型的,最重要的是,它不像粒子滤波的其他步骤那样固有地可并行化。图形处理单元(gpu)和现场可编程门阵列(fpga)等并行计算设备已被提出用于加速重采样。在本文中,我们提出了一种新的并行架构,将四种最先进的重采样算法(系统、剩余系统、大都会和拒绝重采样)映射到FPGA上。引入特定于fpga的优化来进一步优化上述系统的性能。所提出的架构在Virtex-6 LX240T FPGA器件上实现,逻辑资源利用率为一半。与NVIDIA K20 GPU上各自最先进的实现相比,实现的加速在1.7x-49倍的范围内。
{"title":"Parallel resampling for particle filters on FPGAs","authors":"Shuanglong Liu, Grigorios Mingas, C. Bouganis","doi":"10.1109/FPT.2014.7082775","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082775","url":null,"abstract":"Particle filters (PFs) are a set of algorithms that implement recursive Bayesian filtering, which represent the posterior distribution by a set of weighted samples. Resampling is a fundamental operation in PF algorithms. It consists of taking a population of samples and reconstructing it based on the weights attached to each sample, favouring the samples with large weights. However, resampling is computationally intensive when the number of samples is large and, most importantly, it is not inherently parallelizable like the other steps of the particle filter. Parallel computing devices such as Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs) have been proposed to accelerate resampling. In this paper, we propose novel parallel architectures that map four state-of-the-art resampling algorithms (systematic, residual systematic, Metropolis and Rejection resampling) to a FPGA. FPGA-specific optimisations are introduced to further optimize the performance of the above systems. The proposed architectures are implemented in a Virtex-6 LX240T FPGA device with half-utilization of logic resources. Compared to the respective state-of-the-art implementations on an NVIDIA K20 GPU, the achieved speedups are in the range of 1.7x-49x.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"21 1","pages":"191-198"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74092905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
An FPGA-based spectral anomaly detection system 基于fpga的光谱异常检测系统
Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082772
Duncan J. M. Moss, Zhe Zhang, Nicholas J. Fraser, P. Leong
Anomaly detection based on spectral features is applicable to a diverse range of problems including prognostic and health management, vibration analysis, astronomy, biomedicai engineering and computational finance. The input data could be regularly sampled, as in the case of a standard analogue to digital converter sampling a bandlimited signal at above the Nyquist rate, or irregularly sampled, as in the case of stock quotes or astronomical data. In this paper, we present new online algorithms for the computation of power spectra for regularly or irregularly sampled data, and performing anomaly detection on time series data. Both algorithms allow hardware implementations with O(l) time complexity, this being the minimum for any system that considers all the samples. We combine the two algorithms to form a power Spectrum-based Anomaly Detector (SAD). We also describe an implementation of SAD which has minimal hardware requirements, and achieves one to two orders of magnitude improvement in speed, latency, power and energy over a traditional processor-based design.
基于光谱特征的异常检测适用于各种各样的问题,包括预测和健康管理、振动分析、天文学、生物医学工程和计算金融。输入数据可以定期采样,就像标准的模拟数字转换器以高于奈奎斯特的速率采样带宽有限的信号一样,或者不规则采样,就像股票报价或天文数据一样。在本文中,我们提出了一种新的在线算法,用于计算规则或不规则采样数据的功率谱,并对时间序列数据进行异常检测。这两种算法都允许硬件实现的时间复杂度为0(1),这是考虑所有样本的任何系统的最小值。我们将这两种算法结合起来,形成了一种基于功率谱的异常检测器(SAD)。我们还描述了一种SAD的实现,它具有最小的硬件要求,并且与传统的基于处理器的设计相比,在速度、延迟、功率和能量方面实现了一到两个数量级的改进。
{"title":"An FPGA-based spectral anomaly detection system","authors":"Duncan J. M. Moss, Zhe Zhang, Nicholas J. Fraser, P. Leong","doi":"10.1109/FPT.2014.7082772","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082772","url":null,"abstract":"Anomaly detection based on spectral features is applicable to a diverse range of problems including prognostic and health management, vibration analysis, astronomy, biomedicai engineering and computational finance. The input data could be regularly sampled, as in the case of a standard analogue to digital converter sampling a bandlimited signal at above the Nyquist rate, or irregularly sampled, as in the case of stock quotes or astronomical data. In this paper, we present new online algorithms for the computation of power spectra for regularly or irregularly sampled data, and performing anomaly detection on time series data. Both algorithms allow hardware implementations with O(l) time complexity, this being the minimum for any system that considers all the samples. We combine the two algorithms to form a power Spectrum-based Anomaly Detector (SAD). We also describe an implementation of SAD which has minimal hardware requirements, and achieves one to two orders of magnitude improvement in speed, latency, power and energy over a traditional processor-based design.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"42 1","pages":"175-182"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78820197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Analysis and optimization of a deeply pipelined FPGA soft processor 深度流水线FPGA软处理器的分析与优化
Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082783
Hui Yan Cheah, Suhaib A. Fahmy, Nachiket Kapre
FPGA soft processors have been shown to achieve high frequency when designed around the specific capabilities of heterogenous resources on modern FPGAs. However, such performance comes at a cost of deep pipelines, which can result in a larger number of idle cycles when executing programs with long dependency chains in the instruction sequence. We perform a full design-space exploration of a DSP block based soft processor to examine the effect of pipeline depth on frequency, area, and program runtime, noting the significant number of NOPs required to resolve dependencies. We then explore the potential of a restricted data forwarding approach in improving runtime by significantly reducing NOP padding. The result is a processor that runs close to the fabric limit of 500MHz with a case for simple data forwarding.
FPGA软处理器已被证明,当围绕现代FPGA上异构资源的特定能力进行设计时,可以实现高频率。然而,这样的性能是以深度管道为代价的,当执行指令序列中具有长依赖链的程序时,这会导致大量的空闲周期。我们对基于DSP块的软处理器进行了全面的设计空间探索,以检查管道深度对频率、面积和程序运行时的影响,并注意到解决依赖关系所需的大量nop。然后,我们通过显著减少NOP填充来探索受限数据转发方法在改善运行时间方面的潜力。其结果是处理器运行接近500MHz的织物限制,并具有简单的数据转发。
{"title":"Analysis and optimization of a deeply pipelined FPGA soft processor","authors":"Hui Yan Cheah, Suhaib A. Fahmy, Nachiket Kapre","doi":"10.1109/FPT.2014.7082783","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082783","url":null,"abstract":"FPGA soft processors have been shown to achieve high frequency when designed around the specific capabilities of heterogenous resources on modern FPGAs. However, such performance comes at a cost of deep pipelines, which can result in a larger number of idle cycles when executing programs with long dependency chains in the instruction sequence. We perform a full design-space exploration of a DSP block based soft processor to examine the effect of pipeline depth on frequency, area, and program runtime, noting the significant number of NOPs required to resolve dependencies. We then explore the potential of a restricted data forwarding approach in improving runtime by significantly reducing NOP padding. The result is a processor that runs close to the fabric limit of 500MHz with a case for simple data forwarding.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"10 1","pages":"235-238"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84095359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Assessing scrubbing techniques for Xilinx SRAM-based FPGAs in space applications 评估在空间应用中基于Xilinx sram的fpga的擦洗技术
Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082803
Fredrik Brosser, Emil Milh, Vilhelm Geijer, P. Larsson-Edefors
SRAM-based FPGAs are becoming increasingly attractive for use in space applications due to their reconfigurability and signal processing capabilities, as well as their increasing speed and capacity. Traditional SRAM-based FPGAs, however, are highly sensitive to the ionizing radiation environment in space, making them prone to radiation-induced memory upsets. In this paper, we evaluate and compare scrubbing techniques for Xilinx SRAM-based FPGAs with respect to radiation-induced single event upsets. A test framework using an exchangeable payload is developed for this purpose and run on a Xilinx Virtex-5 FPGA. We show that recent SRAM-based FPGAs can constitute a cost-efficient alternative to radiation-hardened or antifuse FPGAs for non-critical space application such as satellite instruments.
基于sram的fpga由于其可重构性和信号处理能力,以及不断提高的速度和容量,在空间应用中越来越有吸引力。然而,传统的基于sram的fpga对空间电离辐射环境高度敏感,使它们容易受到辐射引起的记忆干扰。在本文中,我们评估和比较了基于Xilinx sram的fpga在辐射引起的单事件扰动方面的擦洗技术。为此开发了一个使用可交换有效负载的测试框架,并在Xilinx Virtex-5 FPGA上运行。我们表明,最近基于sram的fpga可以构成非关键空间应用(如卫星仪器)的抗辐射或抗熔阻fpga的经济高效替代方案。
{"title":"Assessing scrubbing techniques for Xilinx SRAM-based FPGAs in space applications","authors":"Fredrik Brosser, Emil Milh, Vilhelm Geijer, P. Larsson-Edefors","doi":"10.1109/FPT.2014.7082803","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082803","url":null,"abstract":"SRAM-based FPGAs are becoming increasingly attractive for use in space applications due to their reconfigurability and signal processing capabilities, as well as their increasing speed and capacity. Traditional SRAM-based FPGAs, however, are highly sensitive to the ionizing radiation environment in space, making them prone to radiation-induced memory upsets. In this paper, we evaluate and compare scrubbing techniques for Xilinx SRAM-based FPGAs with respect to radiation-induced single event upsets. A test framework using an exchangeable payload is developed for this purpose and run on a Xilinx Virtex-5 FPGA. We show that recent SRAM-based FPGAs can constitute a cost-efficient alternative to radiation-hardened or antifuse FPGAs for non-critical space application such as satellite instruments.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"32 1","pages":"296-299"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91350383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Low-latency option pricing using systolic binomial trees 使用收缩二项树的低延迟期权定价
Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082752
Aryan Tavakkoli, David B. Thomas
This paper presents a novel reconfigurable hardware accelerator for the pricing of American options using the binomial-tree model. The proposed architecture exploits both pipeline and coarse-grain parallelism in a highly efficient and scalable systolic solution, designed to exploit the large numbers of DSP blocks in modern architectures. The architecture can be tuned at compile-time to match user requirements, from dedicating the entire FPGA to low latency calculation of a single option, to high throughput concurrent evaluation of multiple options. On a Xilinx Virtex-7 xc7vx980t FPGA this allows a single option with 768 time steps to be priced with a latency of less than 22 micro-seconds and a pricing rate of more than 100 K options/sec. Compared to the fastest previous reconfigurable implementation of concurrent option evaluation, we achieve an improvement of 65 x in latency and 9x in throughput with a value of 10.7 G nodes/sec, on a Virtex-4 xc4vsx55 FPGA.
本文利用二叉树模型提出了一种新的可重构美式期权定价硬件加速器。所提出的架构在高效和可扩展的收缩解决方案中利用管道和粗粒度并行性,旨在利用现代架构中的大量DSP块。该架构可以在编译时进行调整,以满足用户需求,从将整个FPGA专用于单个选项的低延迟计算,到多个选项的高吞吐量并发评估。在Xilinx Virtex-7 xc7vx980t FPGA上,这使得具有768个时间步长的单个选项可以以小于22微秒的延迟和超过100 K选项/秒的定价速率进行定价。与之前最快的并发选项评估的可重构实现相比,我们在Virtex-4 xc4vs555 FPGA上实现了延迟65倍和吞吐量9倍的改进,其值为10.7 G节点/秒。
{"title":"Low-latency option pricing using systolic binomial trees","authors":"Aryan Tavakkoli, David B. Thomas","doi":"10.1109/FPT.2014.7082752","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082752","url":null,"abstract":"This paper presents a novel reconfigurable hardware accelerator for the pricing of American options using the binomial-tree model. The proposed architecture exploits both pipeline and coarse-grain parallelism in a highly efficient and scalable systolic solution, designed to exploit the large numbers of DSP blocks in modern architectures. The architecture can be tuned at compile-time to match user requirements, from dedicating the entire FPGA to low latency calculation of a single option, to high throughput concurrent evaluation of multiple options. On a Xilinx Virtex-7 xc7vx980t FPGA this allows a single option with 768 time steps to be priced with a latency of less than 22 micro-seconds and a pricing rate of more than 100 K options/sec. Compared to the fastest previous reconfigurable implementation of concurrent option evaluation, we achieve an improvement of 65 x in latency and 9x in throughput with a value of 10.7 G nodes/sec, on a Virtex-4 xc4vsx55 FPGA.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"527 1","pages":"44-51"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83555272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Fanout decomposition dataflow optimizations for FPGA-based Sparse LU factorization 基于fpga稀疏LU分解的扇出分解数据流优化
Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082787
Siddhartha, Nachiket Kapre
Performance of FPGA-based token dataflow architectures is often limited by the long tail distribution of parallelism in the compute paths of the dataflow graphs. This is known to limit speedup of dataflow processing of Sparse LU factorization to only 3-10x over CPUs. One reason behind the limitations is the serialization penalty of processing high-fanout nodes in the dataflow graph on traditional dataflow processing architectures. In this paper, we show how to perform one-time static fanout decomposition and selective node replication transformations to input dataflow graphs. These transformations are one-time static compute costs that are typically amortized over millions of iterations. For dataflow graphs extracted for sparse LU factorization, we demonstrate up to 2.3x speedup (1.2x geomean average) with this technique across a range of benchmark problems.
基于fpga的令牌数据流架构的性能通常受到数据流图计算路径中并行性的长尾分布的限制。众所周知,这将稀疏LU分解的数据流处理的加速限制在cpu上的3-10倍。限制背后的一个原因是,在传统的数据流处理架构上处理数据流图中的高扇出节点会造成序列化损失。在本文中,我们展示了如何执行一次性静态扇出分解和选择性节点复制转换来输入数据流图。这些转换是一次性的静态计算成本,通常在数百万次迭代中分摊。对于为稀疏LU分解提取的数据流图,我们在一系列基准测试问题中展示了使用该技术高达2.3倍的加速(1.2倍的几何平均)。
{"title":"Fanout decomposition dataflow optimizations for FPGA-based Sparse LU factorization","authors":"Siddhartha, Nachiket Kapre","doi":"10.1109/FPT.2014.7082787","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082787","url":null,"abstract":"Performance of FPGA-based token dataflow architectures is often limited by the long tail distribution of parallelism in the compute paths of the dataflow graphs. This is known to limit speedup of dataflow processing of Sparse LU factorization to only 3-10x over CPUs. One reason behind the limitations is the serialization penalty of processing high-fanout nodes in the dataflow graph on traditional dataflow processing architectures. In this paper, we show how to perform one-time static fanout decomposition and selective node replication transformations to input dataflow graphs. These transformations are one-time static compute costs that are typically amortized over millions of iterations. For dataflow graphs extracted for sparse LU factorization, we demonstrate up to 2.3x speedup (1.2x geomean average) with this technique across a range of benchmark problems.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"78 1","pages":"252-255"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82619278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2014 International Conference on Field-Programmable Technology (FPT)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1