首页 > 最新文献

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays最新文献

英文 中文
FPGA-based biophysically-meaningful modeling of olivocerebellar neurons 基于fpga的橄榄小脑神经元生物物理意义建模
Georgios Smaragdos, S. Isaza, M. F. V. Eijk, I. Sourdis, C. Strydis
The Inferior-Olivary nucleus (ION) is a well-charted region of the brain, heavily associated with sensorimotor control of the body. It comprises ION cells with unique properties which facilitate sensory processing and motor-learning skills. Various simulation models of ION-cell networks have been written in an attempt to unravel their mysteries. However, simulations become rapidly intractable when biophysically plausible models and meaningful network sizes (>=100 cells) are modeled. To overcome this problem, in this work we port a highly detailed ION cell network model, originally coded in Matlab, onto an FPGA chip. It was first converted to ANSI C code and extensively profiled. It was, then, translated to HLS C code for the Xilinx Vivado toolflow and various algorithmic and arithmetic optimizations were applied. The design was implemented in a Virtex 7 (XC7VX485T) device and can simulate a 96-cell network at real-time speed, yielding a speedup of x700 compared to the original Matlab code and x12.5 compared to the reference C implementation running on a Intel Xeon 2.66GHz machine with 20GB RAM. For a 1,056-cell network (non-real-time), an FPGA speedup of x45 against the C code can be achieved, demonstrating the design's usefulness in accelerating neuroscience research. Limited by the available on-chip memory, the FPGA can maximally support a 14,400-cell network (non-real-time) with online parameter configurability for cell state and network size. The maximum throughput of the FPGA ION-network accelerator can reach 2.13 GFLOPS.
下橄榄核(ION)是大脑中一个清晰的区域,与身体的感觉运动控制密切相关。它由离子细胞组成,具有独特的特性,可以促进感觉处理和运动学习技能。人们编写了各种离子细胞网络的模拟模型,试图解开它们的奥秘。然而,当生物物理上合理的模型和有意义的网络大小(>=100个细胞)被建模时,模拟变得迅速棘手。为了克服这个问题,在这项工作中,我们将一个非常详细的离子单元网络模型(最初用Matlab编码)移植到FPGA芯片上。它首先被转换为ANSI C代码并进行了广泛的分析。然后,将其翻译为Xilinx Vivado工具流的HLS C代码,并应用各种算法和算术优化。该设计在Virtex 7 (XC7VX485T)设备上实现,可以以实时速度模拟96个小区的网络,与原始Matlab代码相比,速度提高了x700,与在20GB RAM的Intel Xeon 2.66GHz机器上运行的参考C实现相比,速度提高了x12.5。对于一个1056个单元的网络(非实时),FPGA对C代码的加速可以达到x45,这证明了该设计在加速神经科学研究方面的有用性。受片上可用内存的限制,FPGA可以最大限度地支持14,400个单元网络(非实时),并具有可在线配置单元状态和网络大小的参数。FPGA离子网络加速器的最大吞吐量可达2.13 GFLOPS。
{"title":"FPGA-based biophysically-meaningful modeling of olivocerebellar neurons","authors":"Georgios Smaragdos, S. Isaza, M. F. V. Eijk, I. Sourdis, C. Strydis","doi":"10.1145/2554688.2554790","DOIUrl":"https://doi.org/10.1145/2554688.2554790","url":null,"abstract":"The Inferior-Olivary nucleus (ION) is a well-charted region of the brain, heavily associated with sensorimotor control of the body. It comprises ION cells with unique properties which facilitate sensory processing and motor-learning skills. Various simulation models of ION-cell networks have been written in an attempt to unravel their mysteries. However, simulations become rapidly intractable when biophysically plausible models and meaningful network sizes (>=100 cells) are modeled. To overcome this problem, in this work we port a highly detailed ION cell network model, originally coded in Matlab, onto an FPGA chip. It was first converted to ANSI C code and extensively profiled. It was, then, translated to HLS C code for the Xilinx Vivado toolflow and various algorithmic and arithmetic optimizations were applied. The design was implemented in a Virtex 7 (XC7VX485T) device and can simulate a 96-cell network at real-time speed, yielding a speedup of x700 compared to the original Matlab code and x12.5 compared to the reference C implementation running on a Intel Xeon 2.66GHz machine with 20GB RAM. For a 1,056-cell network (non-real-time), an FPGA speedup of x45 against the C code can be achieved, demonstrating the design's usefulness in accelerating neuroscience research. Limited by the available on-chip memory, the FPGA can maximally support a 14,400-cell network (non-real-time) with online parameter configurability for cell state and network size. The maximum throughput of the FPGA ION-network accelerator can reach 2.13 GFLOPS.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133459584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Dynamic voltage & frequency scaling with online slack measurement 动态电压和频率缩放与在线松弛测量
Joshua M. Levine, Edward A. Stott, P. Cheung
Timing margins in FPGAs are already significant and as process scaling continues they will have to grow to guarantee operation under increased variation. Margins enforce worst-case operation even in typical conditions and result in devices operating more slowly and consuming more energy than necessary. This paper presents a method of dynamic voltage and frequency scaling that uses online slack measurement to determine timing headroom in a circuit while it is operating and scale the voltage and/or frequency in response. Doing so can significantly reduce power consumption or increase throughput with a minimal overhead. The method is demonstrated on a number of benchmark circuits under a range of operating conditions, constraints and optimisation targets.
fpga的时间裕度已经很大了,随着工艺规模的不断扩大,它们必须不断增长,以保证在不断增加的变化下运行。即使在典型条件下,余量也会导致最坏情况的运行,并导致设备运行速度变慢,消耗的能量比必要的要多。本文提出了一种动态电压和频率缩放方法,该方法使用在线松弛测量来确定电路工作时的定时净空,并相应地缩放电压和/或频率。这样做可以以最小的开销显著降低功耗或提高吞吐量。该方法在一系列工作条件、约束和优化目标下的许多基准电路上进行了演示。
{"title":"Dynamic voltage & frequency scaling with online slack measurement","authors":"Joshua M. Levine, Edward A. Stott, P. Cheung","doi":"10.1145/2554688.2554784","DOIUrl":"https://doi.org/10.1145/2554688.2554784","url":null,"abstract":"Timing margins in FPGAs are already significant and as process scaling continues they will have to grow to guarantee operation under increased variation. Margins enforce worst-case operation even in typical conditions and result in devices operating more slowly and consuming more energy than necessary. This paper presents a method of dynamic voltage and frequency scaling that uses online slack measurement to determine timing headroom in a circuit while it is operating and scale the voltage and/or frequency in response. Doing so can significantly reduce power consumption or increase throughput with a minimal overhead. The method is demonstrated on a number of benchmark circuits under a range of operating conditions, constraints and optimisation targets.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130104243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Fast and effective placement and routing directed high-level synthesis for FPGAs 快速和有效的布局和路由定向高级合成的fpga
Hongbin Zheng, S. Gurumani, K. Rupnow, Deming Chen
Achievable frequency (fmax) is a widely used input constraint for designs targeting Field-Programmable Gate Arrays (FPGA), because of its impact on design latency and throughput. Fmax is limited by critical path delay, which is highly influenced by lower-level details of the circuit implementation such as technology mapping, placement and routing. However, for high-level synthesis~(HLS) design flows, it is challenging to evaluate the real critical delay at the behavioral level. Current HLS flows typically use module pre-characterization for delay estimates. However, we will demonstrate that such delay estimates are not sufficient to obtain high fmax and also minimize total execution latency. In this paper, we introduce a new HLS flow that integrates with Altera's Quartus synthesis and fast placement and routing (PAR) tool to obtain realistic post-PAR delay estimates. This integration enables an iterative flow that improves the performance of the design with both behavioral-level and circuit-level optimizations using realistic delay information. We demonstrate our HLS flow produces up to 24% (on average 20%) improvement in fmax and upto 22% (on average 20%) improvement in execution latency. Furthermore, results demonstrate that our flow is able to achieve from 65% to 91% of the theoretical fmax on Stratix IV devices (550MHz).
可实现频率(fmax)是一种广泛应用于现场可编程门阵列(FPGA)设计的输入约束,因为它会影响设计延迟和吞吐量。Fmax受关键路径延迟的限制,关键路径延迟受电路实现的低级细节(如技术映射、放置和路由)的高度影响。然而,对于高层次的综合设计流程,在行为层面评估真正的临界延迟是一项挑战。当前的HLS流通常使用模块预表征进行延迟估计。然而,我们将证明这样的延迟估计不足以获得高fmax和最小化总执行延迟。在本文中,我们介绍了一个新的HLS流,它集成了Altera的Quartus合成和快速放置和路由(PAR)工具,以获得现实的PAR后延迟估计。这种集成实现了迭代流程,通过使用实际延迟信息进行行为级和电路级优化,提高了设计的性能。我们证明了我们的HLS流在fmax方面提高了24%(平均20%),在执行延迟方面提高了22%(平均20%)。此外,结果表明,我们的流量能够在Stratix IV器件(550MHz)上达到理论fmax的65%至91%。
{"title":"Fast and effective placement and routing directed high-level synthesis for FPGAs","authors":"Hongbin Zheng, S. Gurumani, K. Rupnow, Deming Chen","doi":"10.1145/2554688.2554775","DOIUrl":"https://doi.org/10.1145/2554688.2554775","url":null,"abstract":"Achievable frequency (fmax) is a widely used input constraint for designs targeting Field-Programmable Gate Arrays (FPGA), because of its impact on design latency and throughput. Fmax is limited by critical path delay, which is highly influenced by lower-level details of the circuit implementation such as technology mapping, placement and routing. However, for high-level synthesis~(HLS) design flows, it is challenging to evaluate the real critical delay at the behavioral level. Current HLS flows typically use module pre-characterization for delay estimates. However, we will demonstrate that such delay estimates are not sufficient to obtain high fmax and also minimize total execution latency. In this paper, we introduce a new HLS flow that integrates with Altera's Quartus synthesis and fast placement and routing (PAR) tool to obtain realistic post-PAR delay estimates. This integration enables an iterative flow that improves the performance of the design with both behavioral-level and circuit-level optimizations using realistic delay information. We demonstrate our HLS flow produces up to 24% (on average 20%) improvement in fmax and upto 22% (on average 20%) improvement in execution latency. Furthermore, results demonstrate that our flow is able to achieve from 65% to 91% of the theoretical fmax on Stratix IV devices (550MHz).","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116172185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Session details: Tools and models 1 会话细节:工具和模型
Deming Chen
{"title":"Session details: Tools and models 1","authors":"Deming Chen","doi":"10.1145/3260942","DOIUrl":"https://doi.org/10.1145/3260942","url":null,"abstract":"","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133398091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A scalable routability-driven analytical placer with global router integration for FPGAs (abstract only) 可扩展的可达性驱动的分析放置器,具有fpga的全局路由器集成(仅抽象)
Ka-Chun Lam, W. Tang, Evangeline F. Y. Young
As the sizes of modern circuits become bigger and bigger, implementing those large circuits into FPGA becomes arduous. The state-of-the-art academic FPGA place-and-route tool, VPR, has good quality but needs around a whole day to complete a placement when the input circuit contains millions of lookup tables, excluding the runtime for routing. To expedite the placement process, we propose a routability-driven placement algorithm for FPGA that adopts techniques used in ASIC global placer. Our placer follows the lower-bound-and-upper-bound iterative optimization process in ASIC placers like Ripple. In the lower-bound computation, the total HPWL, modeled using the Bound2Bound net model, is minimized using the conjugate gradient method. In the upper-bound computation, an almost-legalized result is produced by spreading cells linearly in the placement area. Those positions are then served as fixed-point anchors and fed into the next lower-bound computation. Furthermore, global routing will be performed in the upper-bound computation to estimate the routing segment usage, as a mean to consider congestion in placement. We tested our approach using 20 MCNC benchmarks and 4 large benchmarks for performance and scalability. Experimental results show that based on the island-style architecture which VPR is most optimized for, our approach can obtain a placement result 8x faster than VPR with 2% more in channel width, or 3x faster with 1% more in channel width when congestion is being considered. Our approach is even 14x faster than VPR in placing large benchmarks with over 10,000 lookup tables, with only 7% more in channel width.
随着现代电路的尺寸越来越大,在FPGA中实现这些大型电路变得非常困难。最先进的学术FPGA放置和路由工具VPR具有良好的质量,但当输入电路包含数百万个查找表(不包括路由运行时)时,需要大约一整天才能完成放置。为了加快放置过程,我们提出了一种可达性驱动的FPGA放置算法,该算法采用了ASIC全局放置器中使用的技术。我们的砂矿遵循Ripple等ASIC砂矿的下限和上限迭代优化过程。在下界计算中,使用Bound2Bound网络模型建模的总HPWL使用共轭梯度法最小化。在上界计算中,通过在放置区域内线性扩展单元,得到一个几乎合法化的结果。然后将这些位置作为定点锚点,并输入到下一个下界计算中。此外,全局路由将在上界计算中执行,以估计路由段的使用情况,作为考虑放置中的拥塞的平均值。我们使用20个MCNC基准和4个大型性能和可伸缩性基准测试了我们的方法。实验结果表明,基于最适合VPR的岛式架构,我们的方法可以比VPR快8倍,通道宽度增加2%,考虑拥塞时可以比VPR快3倍,通道宽度增加1%。在放置超过10,000个查找表的大型基准测试时,我们的方法甚至比VPR快14倍,通道宽度仅多7%。
{"title":"A scalable routability-driven analytical placer with global router integration for FPGAs (abstract only)","authors":"Ka-Chun Lam, W. Tang, Evangeline F. Y. Young","doi":"10.1145/2554688.2554711","DOIUrl":"https://doi.org/10.1145/2554688.2554711","url":null,"abstract":"As the sizes of modern circuits become bigger and bigger, implementing those large circuits into FPGA becomes arduous. The state-of-the-art academic FPGA place-and-route tool, VPR, has good quality but needs around a whole day to complete a placement when the input circuit contains millions of lookup tables, excluding the runtime for routing. To expedite the placement process, we propose a routability-driven placement algorithm for FPGA that adopts techniques used in ASIC global placer. Our placer follows the lower-bound-and-upper-bound iterative optimization process in ASIC placers like Ripple. In the lower-bound computation, the total HPWL, modeled using the Bound2Bound net model, is minimized using the conjugate gradient method. In the upper-bound computation, an almost-legalized result is produced by spreading cells linearly in the placement area. Those positions are then served as fixed-point anchors and fed into the next lower-bound computation. Furthermore, global routing will be performed in the upper-bound computation to estimate the routing segment usage, as a mean to consider congestion in placement. We tested our approach using 20 MCNC benchmarks and 4 large benchmarks for performance and scalability. Experimental results show that based on the island-style architecture which VPR is most optimized for, our approach can obtain a placement result 8x faster than VPR with 2% more in channel width, or 3x faster with 1% more in channel width when congestion is being considered. Our approach is even 14x faster than VPR in placing large benchmarks with over 10,000 lookup tables, with only 7% more in channel width.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116756506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Coordinating routing resources for hex pips test in island-style FPGAs (abstract only) 在岛式fpga中协调十六进制点测试的路由资源(仅抽象)
Fan Zhang, Lei Chen, Wenyao Xu, Yuanfu Zhao, Zhiping Wen
The significance of FPGA test and the challenge of its increasing cost can never be ignored. In island-style FPGA architectures, hex lines are the principal interconnect resources. Testing hex lines and hex Programmable Interconnect Points (PIPs) have remained as the major technical difficulty in FPGAs test due to complex interconnect rules. Particularly, test in oblique direction of hex PIPs has rarely been addressed in previous studies. Towards this challenge, this paper for the first time proposes a coordinate system and formulates the interconnect rules of hex lines as mathematical equations. For hex PIPs in horizontal and vertical direction, an efficient circle test structure is formed by coordinate equations. For hex PIPs in oblique direction, the coordinate method is used to generate the partial-cascade pattern. The corresponding test vector is also generated, which ensures the ergodicity of hex PIPs in oblique direction. In addition to hex PIPs, hex lines are also covered without extra effort. Compared to previous researches, the configuration number for hex lines is decreased significantly. We evaluate this method on Xilinx XC2V1000, and experimental results show that our proposed method achieves 100% fault coverage for hex PIPs and can be generally applied to all mainstream island-style FPGAs with a similar interconnect structure currently.
FPGA测试的重要性及其成本不断增加所带来的挑战不容忽视。在岛式FPGA架构中,十六进制线是主要的互连资源。由于复杂的互连规则,测试十六进制线和十六进制可编程互连点(pip)仍然是fpga测试的主要技术难点。特别是,斜向测试的十六进制pip很少在以往的研究中得到解决。针对这一挑战,本文首次提出了一种坐标系,并将六边形线的连通规则用数学方程的形式表述出来。对于水平方向和垂直方向的六角pip,利用坐标方程形成有效的圆测试结构。对于斜向的十六进制pip,采用坐标法生成部分级联图。生成相应的测试向量,保证了六角pip在斜向的遍历性。除了十六进制pip,十六进制线也不需要额外的努力。与以往的研究相比,十六进制线的配置数量明显减少。我们在Xilinx XC2V1000上对该方法进行了测试,实验结果表明,我们提出的方法对十六进制pip实现了100%的故障覆盖率,并且可以普遍应用于目前所有具有类似互连结构的主流岛式fpga。
{"title":"Coordinating routing resources for hex pips test in island-style FPGAs (abstract only)","authors":"Fan Zhang, Lei Chen, Wenyao Xu, Yuanfu Zhao, Zhiping Wen","doi":"10.1145/2554688.2554740","DOIUrl":"https://doi.org/10.1145/2554688.2554740","url":null,"abstract":"The significance of FPGA test and the challenge of its increasing cost can never be ignored. In island-style FPGA architectures, hex lines are the principal interconnect resources. Testing hex lines and hex Programmable Interconnect Points (PIPs) have remained as the major technical difficulty in FPGAs test due to complex interconnect rules. Particularly, test in oblique direction of hex PIPs has rarely been addressed in previous studies. Towards this challenge, this paper for the first time proposes a coordinate system and formulates the interconnect rules of hex lines as mathematical equations. For hex PIPs in horizontal and vertical direction, an efficient circle test structure is formed by coordinate equations. For hex PIPs in oblique direction, the coordinate method is used to generate the partial-cascade pattern. The corresponding test vector is also generated, which ensures the ergodicity of hex PIPs in oblique direction. In addition to hex PIPs, hex lines are also covered without extra effort. Compared to previous researches, the configuration number for hex lines is decreased significantly. We evaluate this method on Xilinx XC2V1000, and experimental results show that our proposed method achieves 100% fault coverage for hex PIPs and can be generally applied to all mainstream island-style FPGAs with a similar interconnect structure currently.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124786441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Memory block based scan-BIST architecture for application-dependent FPGA testing 基于内存块的扫描- bist架构,用于应用相关的FPGA测试
Keita Ito, T. Yoneda, Yuta Yamato, K. Hatayama, M. Inoue
This paper presents a scan-based BIST architecture for FPGAs used as application-specific embedded devices for low-volume products. The proposed architecture efficiently utilizes memory blocks, instead of logic elements, to build up BIST components such as LFSR, MISR and scan chains for test points. It also provides enhanced scan functionality for test points and performs a hybrid test application of LOC and enhanced scan to improve delay test quality. Experimental results show that the proposed BIST architecture achieves high delay test quality with efficient resource utilization.
本文提出了一种基于扫描的BIST结构,用于fpga作为小批量产品的专用嵌入式器件。所提出的体系结构有效地利用存储块而不是逻辑元件来构建测试点的LFSR, MISR和扫描链等BIST组件。它还为测试点提供增强的扫描功能,并执行LOC和增强扫描的混合测试应用程序,以提高延迟测试质量。实验结果表明,所提出的BIST体系结构实现了高延迟测试质量和高效的资源利用。
{"title":"Memory block based scan-BIST architecture for application-dependent FPGA testing","authors":"Keita Ito, T. Yoneda, Yuta Yamato, K. Hatayama, M. Inoue","doi":"10.1145/2554688.2554764","DOIUrl":"https://doi.org/10.1145/2554688.2554764","url":null,"abstract":"This paper presents a scan-based BIST architecture for FPGAs used as application-specific embedded devices for low-volume products. The proposed architecture efficiently utilizes memory blocks, instead of logic elements, to build up BIST components such as LFSR, MISR and scan chains for test points. It also provides enhanced scan functionality for test points and performs a hybrid test application of LOC and enhanced scan to improve delay test quality. Experimental results show that the proposed BIST architecture achieves high delay test quality with efficient resource utilization.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127978121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Exploring duty cycle distortions along signal paths in FPGAs (abstract only) 探索沿fpga信号路径的占空比畸变(仅抽象)
Matthias Hinkfoth, R. Joost, R. Salomon
Non-trivial hardware architectures consist of a significant number of fine-grained modules that communication with each other via dedicated signal lines. In field-programmable gate arrays (FPGAs), these communication lines are provided in forms of global vertical and horizontal routing channels, and are subject to the routing process. Since the effects of physical properties on the signal skew along these lines is well understood, this paper investigates the observable effects on a signal's duty cycle. Practical experiments show that the distortion on the duty cycle progressively increases along such wires (connections) and that in the extreme case, a signal may entirely vanish.
重要的硬件架构由大量细粒度模块组成,这些模块通过专用信号线相互通信。在现场可编程门阵列(fpga)中,这些通信线路以全局垂直和水平路由通道的形式提供,并受路由过程的约束。由于物理性质对沿这些线的信号偏斜的影响是很容易理解的,因此本文研究了对信号占空比的可观察影响。实际实验表明,沿这种导线(连接),占空比上的失真逐渐增加,在极端情况下,信号可能完全消失。
{"title":"Exploring duty cycle distortions along signal paths in FPGAs (abstract only)","authors":"Matthias Hinkfoth, R. Joost, R. Salomon","doi":"10.1145/2554688.2554737","DOIUrl":"https://doi.org/10.1145/2554688.2554737","url":null,"abstract":"Non-trivial hardware architectures consist of a significant number of fine-grained modules that communication with each other via dedicated signal lines. In field-programmable gate arrays (FPGAs), these communication lines are provided in forms of global vertical and horizontal routing channels, and are subject to the routing process. Since the effects of physical properties on the signal skew along these lines is well understood, this paper investigates the observable effects on a signal's duty cycle. Practical experiments show that the distortion on the duty cycle progressively increases along such wires (connections) and that in the extreme case, a signal may entirely vanish.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129777560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using high-level synthesis and formal analysis to predict and preempt attacks on industrial control systems 使用高级综合和形式化分析来预测和先发制人对工业控制系统的攻击
L. Lerner, Zane R. Franklin, W. Baumann, C. Patterson
Industrial control systems (ICSes) have the conflicting requirements of security and network access. In the event of large-scale hostilities, factories and infrastructure would more likely be targeted by computer viruses than the bomber squadrons used in WWII. ICS zero-day exploits are now a commodity sold on brokerages to interested parties including nations. We mitigate these threats not by bolstering perimeter security, but rather by assuming that potentially all layers of ICS software have already been compromised and are capable of launching a latent attack while reporting normal system status to human operators. In our approach, application-specific configurable hardware is the final authority for scrutinizing controller commands and process sensors, and can monitor and override operations at the lowest (I/O pin) level of a configurable system-on-chip platform. The process specifications, stability-preserving backup controller, and switchover logic are specified and formally verified as C code, and synthesized into hardware to resist software reconfiguration attacks. To provide greater assurance that the backup controller can be invoked before the physical process becomes unstable, copies of the production controller task and plant model are accelerated to preview the controller's behavior in the near future.
工业控制系统(ICSes)具有安全性和网络访问的冲突需求。在大规模的敌对行动中,工厂和基础设施更有可能成为电脑病毒的目标,而不是二战中使用的轰炸机中队。ICS零日漏洞现在是一种商品,在经纪公司出售给包括国家在内的感兴趣的各方。我们不是通过加强外围安全性来减轻这些威胁,而是假设ICS软件的所有层都可能已经受到损害,并且能够在向人类操作员报告正常系统状态的同时发起潜在攻击。在我们的方法中,特定于应用程序的可配置硬件是审查控制器命令和过程传感器的最终权威,并且可以在可配置的片上系统平台的最低(I/O引脚)级别监视和覆盖操作。过程规范、保持稳定的备份控制器和切换逻辑被指定并正式验证为C代码,并合成为硬件以抵抗软件重构攻击。为了更好地保证在物理过程变得不稳定之前可以调用备份控制器,可以加速生产控制器任务和工厂模型的副本,以便在不久的将来预览控制器的行为。
{"title":"Using high-level synthesis and formal analysis to predict and preempt attacks on industrial control systems","authors":"L. Lerner, Zane R. Franklin, W. Baumann, C. Patterson","doi":"10.1145/2554688.2554759","DOIUrl":"https://doi.org/10.1145/2554688.2554759","url":null,"abstract":"Industrial control systems (ICSes) have the conflicting requirements of security and network access. In the event of large-scale hostilities, factories and infrastructure would more likely be targeted by computer viruses than the bomber squadrons used in WWII. ICS zero-day exploits are now a commodity sold on brokerages to interested parties including nations. We mitigate these threats not by bolstering perimeter security, but rather by assuming that potentially all layers of ICS software have already been compromised and are capable of launching a latent attack while reporting normal system status to human operators. In our approach, application-specific configurable hardware is the final authority for scrutinizing controller commands and process sensors, and can monitor and override operations at the lowest (I/O pin) level of a configurable system-on-chip platform. The process specifications, stability-preserving backup controller, and switchover logic are specified and formally verified as C code, and synthesized into hardware to resist software reconfiguration attacks. To provide greater assurance that the backup controller can be invoked before the physical process becomes unstable, copies of the production controller task and plant model are accelerated to preview the controller's behavior in the near future.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122600010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Wordwidth, instructions, looping, and virtualization: the role of sharing in absolute energy minimization 字宽、指令、循环和虚拟化:共享在绝对能量最小化中的作用
A. DeHon
When are FPGAs more energy efficient than processors? This question is complicated by technology factors and the wide range of application characteristics that can be exploited to minimize energy. Using a wire-dominated energy model to estimate the absolute energy required for programmable computations, we determine when spatially organized programmable computations (FPGAs) require less energy than temporally organized programmable computations (processors). The point of crossover will depend on the metal layers available, the locality, the SIMD wordwidth regularity, and the compactness of the instructions. When the Rent Exponent, p, is less than 0.7, the spatial design is always more energy efficient. When p=0.8, the technology offers 8-metal layers for routing, and data can be organized into 16b words and processed in tight loops of no more than 128 instructions, the temporal design uses less energy when the number of LUTs is greater than 64K. We further show that heterogeneous multicontext architectures can use even less energy than the p=0.8, 16b word temporal case.
fpga何时比处理器更节能?这个问题由于技术因素和广泛的应用特性而变得复杂,这些特性可以用来最大限度地减少能源。使用线主导的能量模型来估计可编程计算所需的绝对能量,我们确定何时空间组织的可编程计算(fpga)比时间组织的可编程计算(处理器)需要更少的能量。交叉点将取决于可用的金属层、局部性、SIMD字宽规则性和指令的紧凑性。当租金指数p < 0.7时,空间设计更节能。当p=0.8时,该技术提供8个金属层用于路由,数据可以组织成16b个字,并在不超过128条指令的紧密环路中进行处理,当lut数量大于64K时,时序设计使用更少的能量。我们进一步表明,异构多上下文架构使用的能量甚至比p=0.8, 16b单词时态的情况更少。
{"title":"Wordwidth, instructions, looping, and virtualization: the role of sharing in absolute energy minimization","authors":"A. DeHon","doi":"10.1145/2554688.2554781","DOIUrl":"https://doi.org/10.1145/2554688.2554781","url":null,"abstract":"When are FPGAs more energy efficient than processors? This question is complicated by technology factors and the wide range of application characteristics that can be exploited to minimize energy. Using a wire-dominated energy model to estimate the absolute energy required for programmable computations, we determine when spatially organized programmable computations (FPGAs) require less energy than temporally organized programmable computations (processors). The point of crossover will depend on the metal layers available, the locality, the SIMD wordwidth regularity, and the compactness of the instructions. When the Rent Exponent, p, is less than 0.7, the spatial design is always more energy efficient. When p=0.8, the technology offers 8-metal layers for routing, and data can be organized into 16b words and processed in tight loops of no more than 128 instructions, the temporal design uses less energy when the number of LUTs is greater than 64K. We further show that heterogeneous multicontext architectures can use even less energy than the p=0.8, 16b word temporal case.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130937684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1