首页 > 最新文献

2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines最新文献

英文 中文
A Range and Scaling Study of an FPGA-Based Digital Wireless Channel Emulator 基于fpga的数字无线信道仿真器的范围和缩放研究
Scott Buscemi, William V. Kritikos, R. Sass
A Digital Wireless Channel Emulator (DWCE) is a system that is capable of emulating the RF environment for a group of wireless devices. A major issue with current designs is that they do not scale to a large enough number of nodes to emulate meaningful network. A reason for this lack of scalability is the large amount of computations and network capacity required for such a system. Previously documented DWCE systems implement a hub-and-spoke configuration that inhibits them from simply adding additional hardware to scale. This paper investigates the use of a FPGA cluster configured as a distributed system to provide the computational and network structure to scale a DWCE to support 1250 wireless devices. This scale is approximately two orders of magnitude larger than any other previously documented system. This paper presents multiple FPGA cluster configurations that use currently available hardware and describes the algorithms used to route the signals through the network and place the computational hardware on each FPGA. The low level VHDL Signal Path Component (SPC) is synthesized and mapped under different parameters to interpolate is resource utilization. One example FPGA build with enough SPCs to fill 80% of the FPGA resources is successfully run through the Xilinx tool-chain to determine the maximum FPGA system clock speed. Finally, the scaling results are presented that detail the maximum sample frequency of various sized DWCE systems which could be used to examine a variety of wireless devices.
数字无线信道仿真器(DWCE)是一种能够模拟一组无线设备的射频环境的系统。当前设计的一个主要问题是,它们不能扩展到足够多的节点来模拟有意义的网络。缺乏可伸缩性的一个原因是这种系统需要大量的计算和网络容量。以前记录的DWCE系统实现了一种轮辐配置,禁止它们简单地添加额外的硬件来进行扩展。本文研究了使用FPGA集群配置为分布式系统,以提供计算和网络结构来扩展DWCE以支持1250个无线设备。这个规模大约比以前记载的任何其他系统都要大两个数量级。本文介绍了使用当前可用硬件的多个FPGA集群配置,并描述了用于通过网络路由信号和将计算硬件放置在每个FPGA上的算法。对低电平VHDL信号路径分量(SPC)进行了合成,并在不同参数下进行了映射,以插值其资源利用率。通过Xilinx工具链成功地运行了一个示例FPGA构建,其中有足够的spc来填充80%的FPGA资源,以确定FPGA系统的最大时钟速度。最后,给出了缩放结果,详细说明了各种尺寸的DWCE系统的最大采样频率,可用于检测各种无线设备。
{"title":"A Range and Scaling Study of an FPGA-Based Digital Wireless Channel Emulator","authors":"Scott Buscemi, William V. Kritikos, R. Sass","doi":"10.1109/FCCM.2013.42","DOIUrl":"https://doi.org/10.1109/FCCM.2013.42","url":null,"abstract":"A Digital Wireless Channel Emulator (DWCE) is a system that is capable of emulating the RF environment for a group of wireless devices. A major issue with current designs is that they do not scale to a large enough number of nodes to emulate meaningful network. A reason for this lack of scalability is the large amount of computations and network capacity required for such a system. Previously documented DWCE systems implement a hub-and-spoke configuration that inhibits them from simply adding additional hardware to scale. This paper investigates the use of a FPGA cluster configured as a distributed system to provide the computational and network structure to scale a DWCE to support 1250 wireless devices. This scale is approximately two orders of magnitude larger than any other previously documented system. This paper presents multiple FPGA cluster configurations that use currently available hardware and describes the algorithms used to route the signals through the network and place the computational hardware on each FPGA. The low level VHDL Signal Path Component (SPC) is synthesized and mapped under different parameters to interpolate is resource utilization. One example FPGA build with enough SPCs to fill 80% of the FPGA resources is successfully run through the Xilinx tool-chain to determine the maximum FPGA system clock speed. Finally, the scaling results are presented that detail the maximum sample frequency of various sized DWCE systems which could be used to examine a variety of wireless devices.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125909523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Escaping the Academic Sandbox: Realizing VPR Circuits on Xilinx Devices 逃离学术沙箱:在赛灵思设备上实现VPR电路
Eddie Hung, F. Eslami, S. Wilton
This paper presents a new, open-source method for FPGA CAD researchers to realize their techniques on real Xilinx devices. Specifically, we extend the Verilog-To-Routing (VTR) suite, which includes the VPR place-and-route CAD tool on which many FPGA innovations have been based, to generate working Xilinx bitstreams via the Xilinx Design Language (XDL). Currently, we can faithfully translate VPR's heterogeneous packing and placement results into an exact Xilinx `map' netlist, which is then routed by its `par' tool. We showcase the utility of this new method with two compelling applications targeting a 40nm Virtex-6 device: a fair comparison of the area, delay, and CAD runtime of academia's state-of-the-art VTR How with a commercial, closed-source equivalent, along with a CAD experiment evaluated using physical measurements of on-chip power consumption and die temperature, over time. This extended How - VTR-to-Bitstream - is released to the community with the hope that it can enhance existing research projects as well as unlock new ones.
本文为FPGA CAD研究人员提供了一种新的、开源的方法,可以在实际的赛灵思器件上实现他们的技术。具体来说,我们扩展了Verilog-To-Routing (VTR)套件,其中包括许多FPGA创新所基于的VPR位置和路由CAD工具,通过Xilinx设计语言(XDL)生成工作的Xilinx比特流。目前,我们可以将VPR的异构封装和放置结果忠实地转换为精确的Xilinx“地图”网表,然后通过其“par”工具进行路由。我们通过针对40nm Virtex-6器件的两个引人注目的应用展示了这种新方法的实用性:学术界最先进的VTR How与商业闭源等效器件的面积、延迟和CAD运行时间的公平比较,以及使用芯片上功耗和芯片温度随时间的物理测量进行CAD实验评估。这个扩展的How - VTR-to-Bitstream -被发布给社区,希望它可以增强现有的研究项目,并解锁新的项目。
{"title":"Escaping the Academic Sandbox: Realizing VPR Circuits on Xilinx Devices","authors":"Eddie Hung, F. Eslami, S. Wilton","doi":"10.1109/FCCM.2013.40","DOIUrl":"https://doi.org/10.1109/FCCM.2013.40","url":null,"abstract":"This paper presents a new, open-source method for FPGA CAD researchers to realize their techniques on real Xilinx devices. Specifically, we extend the Verilog-To-Routing (VTR) suite, which includes the VPR place-and-route CAD tool on which many FPGA innovations have been based, to generate working Xilinx bitstreams via the Xilinx Design Language (XDL). Currently, we can faithfully translate VPR's heterogeneous packing and placement results into an exact Xilinx `map' netlist, which is then routed by its `par' tool. We showcase the utility of this new method with two compelling applications targeting a 40nm Virtex-6 device: a fair comparison of the area, delay, and CAD runtime of academia's state-of-the-art VTR How with a commercial, closed-source equivalent, along with a CAD experiment evaluated using physical measurements of on-chip power consumption and die temperature, over time. This extended How - VTR-to-Bitstream - is released to the community with the hope that it can enhance existing research projects as well as unlock new ones.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"58 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114101829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
On-chip Context Save and Restore of Hardware Tasks on Partially Reconfigurable FPGAs 部分可重构fpga上硬件任务的片上上下文保存与恢复
Aurelio Morales-Villanueva, A. Gordon-Ross
Partial reconfiguration (PR) of field-programmable gate arrays (FPGAs) enables hardware tasks to time multiplex PR regions (PRRs) by isolating reconfiguration to only the reconfigured PRR, which avoids halting the entire FPGA's execution. Time multiplexing PRRs requires support for unloading/loading tasks and for resuming a task's execution state. In order to resume a task's execution state, the execution state (context) must be saved when the task is unloaded so that the execution state can be restored when the task resumes- context save (CS) and context restore (CR), respectively. In this paper, we present a software-based, on-chip context save and restore (CSR) for PR-capable FPGAs. As compared to prior work, our CSR is autonomous (i.e., does not require any external host support), does not require custom on-chip hardware, is portable across any system design, and does not require tool flow modifications or special tools. Experimental results extensively evaluate the CSR execution time based on PRR size, enabling designers to trade off PRR granularity for CSR execution time based on application requirements.
现场可编程门阵列(FPGA)的部分重构(PR)通过将重构隔离到重新配置的PRR,从而使硬件任务能够对多路PR区域(PRR)进行定时,从而避免了整个FPGA的执行中断。时间复用PRRs需要支持卸载/加载任务和恢复任务的执行状态。为了恢复任务的执行状态,必须在任务卸载时保存执行状态(上下文),以便在任务恢复时恢复执行状态-分别为上下文保存(CS)和上下文恢复(CR)。在本文中,我们提出了一种基于软件的片上上下文保存和恢复(CSR),用于具有pr功能的fpga。与之前的工作相比,我们的CSR是自主的(即,不需要任何外部主机支持),不需要定制的片上硬件,可移植到任何系统设计中,不需要修改工具流程或特殊工具。实验结果广泛评估了基于PRR大小的CSR执行时间,使设计人员能够根据应用程序需求在PRR粒度与CSR执行时间之间进行权衡。
{"title":"On-chip Context Save and Restore of Hardware Tasks on Partially Reconfigurable FPGAs","authors":"Aurelio Morales-Villanueva, A. Gordon-Ross","doi":"10.1109/FCCM.2013.13","DOIUrl":"https://doi.org/10.1109/FCCM.2013.13","url":null,"abstract":"Partial reconfiguration (PR) of field-programmable gate arrays (FPGAs) enables hardware tasks to time multiplex PR regions (PRRs) by isolating reconfiguration to only the reconfigured PRR, which avoids halting the entire FPGA's execution. Time multiplexing PRRs requires support for unloading/loading tasks and for resuming a task's execution state. In order to resume a task's execution state, the execution state (context) must be saved when the task is unloaded so that the execution state can be restored when the task resumes- context save (CS) and context restore (CR), respectively. In this paper, we present a software-based, on-chip context save and restore (CSR) for PR-capable FPGAs. As compared to prior work, our CSR is autonomous (i.e., does not require any external host support), does not require custom on-chip hardware, is portable across any system design, and does not require tool flow modifications or special tools. Experimental results extensively evaluate the CSR execution time based on PRR size, enabling designers to trade off PRR granularity for CSR execution time based on application requirements.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128106302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Birth and adolescence of reconfigurable computing: a survey of the first 20 years of field-programmable custom computing machines 可重构计算的诞生和青春期:对现场可编程定制计算机器前20年的调查
Kenneth L. Pocek, R. Tessier, A. DeHon
For 20 years, the International Symposium on Field-Programmable Custom Computing Machines (FCCM) has explored how FPGAs and FPGA-like architectures can bring unique capabilities to computational tasks. We survey the evolution of the field of reconfigurable computing as reflected in FCCM, providing a guide to the body-of-knowledge accumulated in architecture, compute models, tools, run-time reconfiguration, and applications.
20年来,现场可编程定制计算机国际研讨会(FCCM)一直在探索fpga和类fpga架构如何为计算任务带来独特的能力。我们调查了FCCM中反映的可重构计算领域的发展,为体系结构、计算模型、工具、运行时重构和应用程序中积累的知识体提供了指南。
{"title":"Birth and adolescence of reconfigurable computing: a survey of the first 20 years of field-programmable custom computing machines","authors":"Kenneth L. Pocek, R. Tessier, A. DeHon","doi":"10.1109/FPGA.2013.6882273","DOIUrl":"https://doi.org/10.1109/FPGA.2013.6882273","url":null,"abstract":"For 20 years, the International Symposium on Field-Programmable Custom Computing Machines (FCCM) has explored how FPGAs and FPGA-like architectures can bring unique capabilities to computational tasks. We survey the evolution of the field of reconfigurable computing as reflected in FCCM, providing a guide to the body-of-knowledge accumulated in architecture, compute models, tools, run-time reconfiguration, and applications.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134211983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Global Atmospheric Simulation on a Reconfigurable Platform 基于可重构平台的全球大气模拟
L. Gan, H. Fu, W. Luk, Chao Yang, Wei Xue, Guangwen Yang
Summary form only given. As the only method to study long-term climate trend and to predict potential climate risk, climate modeling is becoming a key research topic among governments and research organizations. One of the most essential and challenging components in climate modeling is the atmospheric model. To cover high resolution in climate simulation scenarios, developers have to face the challenges from billions of mesh points and extremely complex algorithms. Shallow Water Equations (SWEs) are a set of conservation laws that perform most of the essential characteristics of the atmosphere. The study of SWEs can serve as the starting point for understanding the dynamic behavior of the global atmosphere. We choose cubed-sphere mesh as the computational mesh for its better load balance in pole regions over other meshes such as the latitude-longitude mesh. The cubed-sphere mesh is obtained by mapping a cube to the surface of the sphere. The computational domain is then the six patches, each of which is covered with N × N mesh points to be calculated. When written in local coordinates, SWEs have an identical expression on the six patches, that is ∂Q/∂t + 1/Λ ∂(ΛF1)/∂x1 + 1/Λ ∂(ΛF1)/∂z2 + S=0, (1) where (x1, x2) ∈ [-π/4, π/4] are the local coordinates, Q = (h, hu1, hu2)T is the prognostic variable, Fi = uiQ (i = 1, 2) are the convective fluxes, S is the source term. Spatially discretized with a cell-centered finite volume method and integrated with a second-order accurate TVD Runge-Kutta method, SWE solvers are transferred to the computation of a 13-point upwind stencil that exhibits a diamond shape. To get the prognostic components (h, hu1 and hu2) of the central point, its neighboring 12 points need to be accessed. The stencil kernel includes at least 434 ADD/SUB operations, 570 multiplications, 99 divisions. The high arithmetic density of the SWEs algorithm makes it difficult to implement one kernel into the resource-limited FPGA card. In this study, we first proposes a hybrid algorithm that utilizes both CPUs and FPGAs to simulate the global shallow water equations (SWEs). In each of the computational patch, most of the complicated communications happen in the two layers of the outer boundary, whose value need to be exchanged with other patches. Therefore, we decompose each of the six patches into an outer part that includes two layers of the outer boundary meshes, and an inner part that is the remaining part. We assign CPU to handle the communications and the stencil calculation of the outer part, while assign FPGA to process the inner-part stencil. In this way, FPGA and CPU will work simultaneously and the CPU time for stencil and communication can be hidden in the FPGA time for stencil. For the Virtex-6 SX475T that we use in our study, the original program in double-precision will require 299% of the on-board LUTs, 283% of the FFs and 189% of the DSPs, and cannot fit into one FPGA. In order to fit the SWE kernel into one FPGA chip, we appl
只提供摘要形式。气候模拟作为研究长期气候趋势和预测潜在气候风险的唯一方法,正成为各国政府和研究机构的重点研究课题。气候模式中最重要和最具挑战性的组成部分之一是大气模式。为了覆盖气候模拟场景的高分辨率,开发人员必须面对来自数十亿网格点和极其复杂的算法的挑战。浅水方程(SWEs)是一组守恒定律,它表现了大气的大多数基本特征。对swe的研究可以作为理解全球大气动力学行为的起点。我们选择立方球网格作为计算网格,因为它比其他网格(如经纬度网格)在极点区域具有更好的负载平衡。立方体-球体网格是通过将立方体映射到球体表面来获得的。计算域为六个补丁,每个补丁上覆盖N × N个待计算的网格点。当用局部坐标写时,ses在六个小块上有相同的表达式,即∂Q/∂t + 1/Λ∂(ΛF1)/∂x1 + 1/Λ∂(ΛF1)/∂z2 + S=0,(1)其中(x1, x2)∈[-π/4, π/4]为局部坐标,Q = (h, hu1, hu2) t为预测变量,Fi = uiQ (i = 1,2)为对流通量,S为源项。采用以单元为中心的有限体积法进行空间离散,并结合二阶精确TVD龙格-库塔法,将SWE求解方法转化为菱形13点迎风模板的计算。为了得到中心点的预测分量(h, hu1和hu2),需要访问其邻近的12个点。模板内核包括至少434个ADD/SUB操作,570个乘法,99个除法。SWEs算法的高算法密度使得它难以在资源有限的FPGA卡上实现一个内核。在本研究中,我们首先提出了一种混合算法,利用cpu和fpga来模拟全局浅水方程(SWEs)。在每个计算补丁中,大部分复杂的通信发生在外边界的两层,这两层的值需要与其他补丁进行交换。因此,我们将六个补丁中的每一个都分解为一个包含两层外边界网格的外部部分和一个剩余部分的内部部分。我们分配CPU处理外部的通信和模板计算,而分配FPGA处理内部的模板。这样,FPGA和CPU可以同时工作,并且可以将CPU用于模板和通信的时间隐藏在FPGA用于模板的时间中。对于我们在研究中使用的Virtex-6 SX475T,双精度的原始程序将需要299%的板载lut, 283%的ff和189%的dsp,并且不能放入一个FPGA中。为了将SWE内核装入一个FPGA芯片,我们对原始设计进行了两种算法优化。一种是用查找表代替某些计算,以减少计算资源的使用。二是找出算法中的公因数,消除冗余计算。这两个优化减少了20%的资源使用。为了进一步降低资源成本,并将极其复杂的模板内核装入一个FPGA芯片,我们在可定制的表示和精度空间中进行了优化。对于范围较小的变量,采用定点数代替双精度。对于其他动态范围较大的部分,我们使用混合精度的浮点数。通过混合精度浮点和定点运算,我们在单个FPGA上构建了一个复杂的逆风模板内核。该设计包括一个高效的管道,可以同时执行数百个浮点和定点算术运算。与我们之前的工作[1]相比,基于1个FPGA加速卡的解决方案在6核CPU上提供100倍的加速,在由12个CPU核和1个费米GPU组成的天河1a超级计算机节点上提供4倍的加速。
{"title":"Global Atmospheric Simulation on a Reconfigurable Platform","authors":"L. Gan, H. Fu, W. Luk, Chao Yang, Wei Xue, Guangwen Yang","doi":"10.1109/FCCM.2013.26","DOIUrl":"https://doi.org/10.1109/FCCM.2013.26","url":null,"abstract":"Summary form only given. As the only method to study long-term climate trend and to predict potential climate risk, climate modeling is becoming a key research topic among governments and research organizations. One of the most essential and challenging components in climate modeling is the atmospheric model. To cover high resolution in climate simulation scenarios, developers have to face the challenges from billions of mesh points and extremely complex algorithms. Shallow Water Equations (SWEs) are a set of conservation laws that perform most of the essential characteristics of the atmosphere. The study of SWEs can serve as the starting point for understanding the dynamic behavior of the global atmosphere. We choose cubed-sphere mesh as the computational mesh for its better load balance in pole regions over other meshes such as the latitude-longitude mesh. The cubed-sphere mesh is obtained by mapping a cube to the surface of the sphere. The computational domain is then the six patches, each of which is covered with N × N mesh points to be calculated. When written in local coordinates, SWEs have an identical expression on the six patches, that is ∂Q/∂t + 1/Λ ∂(ΛF1)/∂x1 + 1/Λ ∂(ΛF1)/∂z2 + S=0, (1) where (x1, x2) ∈ [-π/4, π/4] are the local coordinates, Q = (h, hu1, hu2)T is the prognostic variable, Fi = uiQ (i = 1, 2) are the convective fluxes, S is the source term. Spatially discretized with a cell-centered finite volume method and integrated with a second-order accurate TVD Runge-Kutta method, SWE solvers are transferred to the computation of a 13-point upwind stencil that exhibits a diamond shape. To get the prognostic components (h, hu1 and hu2) of the central point, its neighboring 12 points need to be accessed. The stencil kernel includes at least 434 ADD/SUB operations, 570 multiplications, 99 divisions. The high arithmetic density of the SWEs algorithm makes it difficult to implement one kernel into the resource-limited FPGA card. In this study, we first proposes a hybrid algorithm that utilizes both CPUs and FPGAs to simulate the global shallow water equations (SWEs). In each of the computational patch, most of the complicated communications happen in the two layers of the outer boundary, whose value need to be exchanged with other patches. Therefore, we decompose each of the six patches into an outer part that includes two layers of the outer boundary meshes, and an inner part that is the remaining part. We assign CPU to handle the communications and the stencil calculation of the outer part, while assign FPGA to process the inner-part stencil. In this way, FPGA and CPU will work simultaneously and the CPU time for stencil and communication can be hidden in the FPGA time for stencil. For the Virtex-6 SX475T that we use in our study, the original program in double-precision will require 299% of the on-board LUTs, 283% of the FFs and 189% of the DSPs, and cannot fit into one FPGA. In order to fit the SWE kernel into one FPGA chip, we appl","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123949155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A High Throughput No-Stall Golomb-Rice Hardware Decoder 一种高吞吐量无失速Golomb-Rice硬件解码器
R. Moussalli, W. Najjar, Xi Luo, Amna Khan
Integer compression techniques can generally be classified as bit-wise and byte-wise approaches. Though at the cost of a larger processing time, bit-wise techniques typically result in a better compression ratio. The Golomb-Rice (GR) method is a bit-wise lossless technique applied to the compression of images, audio files and lists of inverted indices. However, since GR is a serial algorithm, decompression is regarded as a very slow process; to the best of our knowledge, all existing software and hardware native (non-modified) GR decoding engines operate bit-serially on the encoded stream. In this paper, we present (1) the first no-stall hardware architecture, capable of decompressing streams of integers compressed using the GR method, at a rate of several bytes (multiple integers) per hardware cycle; (2) a novel GR decoder based on the latter architecture is further detailed, operating at a peak rate of one integer per cycle. A thorough design space exploration study on the resulting resource utilization and throughput of the aforementioned approaches is presented. Furthermore, a performance study is provided, comparing software approaches to implementations of the novel hardware decoders. While occupying 10% of a Xilinx V6LX240T FPGA, the no-stall architecture core achieves a sustained throughput of over 7 Gbps.
整数压缩技术通常可以分为位压缩和字节压缩两种方法。虽然以更长的处理时间为代价,但位技术通常会产生更好的压缩比。Golomb-Rice (GR)方法是一种用于压缩图像、音频文件和倒排索引列表的逐位无损技术。然而,由于GR是一个串行算法,解压缩被认为是一个非常缓慢的过程;据我们所知,所有现有的软件和硬件原生(未修改的)GR解码引擎在编码流上按位串行操作。在本文中,我们提出了(1)第一个无停机硬件架构,能够以每个硬件周期几个字节(多个整数)的速率对使用GR方法压缩的整数流进行解压缩;(2)进一步详细介绍了基于后一种结构的新型GR解码器,其峰值速率为每周期一个整数。对上述方法的资源利用率和吞吐量进行了全面的设计空间探索研究。此外,还提供了性能研究,比较了新型硬件解码器的软件实现方法。虽然只占Xilinx V6LX240T FPGA的10%,但无停机架构核心实现了超过7 Gbps的持续吞吐量。
{"title":"A High Throughput No-Stall Golomb-Rice Hardware Decoder","authors":"R. Moussalli, W. Najjar, Xi Luo, Amna Khan","doi":"10.1109/FCCM.2013.9","DOIUrl":"https://doi.org/10.1109/FCCM.2013.9","url":null,"abstract":"Integer compression techniques can generally be classified as bit-wise and byte-wise approaches. Though at the cost of a larger processing time, bit-wise techniques typically result in a better compression ratio. The Golomb-Rice (GR) method is a bit-wise lossless technique applied to the compression of images, audio files and lists of inverted indices. However, since GR is a serial algorithm, decompression is regarded as a very slow process; to the best of our knowledge, all existing software and hardware native (non-modified) GR decoding engines operate bit-serially on the encoded stream. In this paper, we present (1) the first no-stall hardware architecture, capable of decompressing streams of integers compressed using the GR method, at a rate of several bytes (multiple integers) per hardware cycle; (2) a novel GR decoder based on the latter architecture is further detailed, operating at a peak rate of one integer per cycle. A thorough design space exploration study on the resulting resource utilization and throughput of the aforementioned approaches is presented. Furthermore, a performance study is provided, comparing software approaches to implementations of the novel hardware decoders. While occupying 10% of a Xilinx V6LX240T FPGA, the no-stall architecture core achieves a sustained throughput of over 7 Gbps.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132024808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
A Fast and Accurate FPGA-Based Fault Injection System 基于fpga的快速准确故障注入系统
Thomas Schweizer, Dustin Peterson, Johannes Maximilian Kühn, T. Kuhn, W. Rosenstiel
This paper introduces an FPGA-based fault injection system. To realize this system a library was developed, which implements a static mapping between a circuit described at RTL or gate-level and its corresponding placed and routed FPGA design. The aim of this mapping is to preserve module and port structure of the placed and routed FPGA design to the RT/gate-level circuit description. To demonstrate the accuracy of this mapping the ISCAS'89 benchmark circuits and the VHDL netlist of the LEON3 system are used. The results show that about 99% of the ports in the RT/gate-level circuit description can be located in the placed and routed FPGA design. Based on this library a fault injection tool was developed to accelerate the fault injection experiment time by bypassing some stages (synthesis, placement and routing) of a re-compilation process. In these experiments a 12 × speedup was achieved when compared to fault injections based on serial fault emulation.
介绍了一种基于fpga的故障注入系统。为了实现该系统,开发了一个库,实现了RTL或门级电路与相应的放置和路由FPGA设计之间的静态映射。这种映射的目的是将放置和路由的FPGA设计的模块和端口结构保留为RT/门级电路描述。为了证明这种映射的准确性,使用了ISCAS'89基准电路和LEON3系统的VHDL网络列表。结果表明,RT/门级电路描述中约99%的端口可以定位在放置和路由的FPGA设计中。在此基础上,开发了故障注入工具,绕过了重新编译过程中的合成、放置和路由等阶段,加快了故障注入实验的速度。在这些实验中,与基于串行故障仿真的故障注入相比,获得了12倍的加速。
{"title":"A Fast and Accurate FPGA-Based Fault Injection System","authors":"Thomas Schweizer, Dustin Peterson, Johannes Maximilian Kühn, T. Kuhn, W. Rosenstiel","doi":"10.1109/FCCM.2013.47","DOIUrl":"https://doi.org/10.1109/FCCM.2013.47","url":null,"abstract":"This paper introduces an FPGA-based fault injection system. To realize this system a library was developed, which implements a static mapping between a circuit described at RTL or gate-level and its corresponding placed and routed FPGA design. The aim of this mapping is to preserve module and port structure of the placed and routed FPGA design to the RT/gate-level circuit description. To demonstrate the accuracy of this mapping the ISCAS'89 benchmark circuits and the VHDL netlist of the LEON3 system are used. The results show that about 99% of the ports in the RT/gate-level circuit description can be located in the placed and routed FPGA design. Based on this library a fault injection tool was developed to accelerate the fault injection experiment time by bypassing some stages (synthesis, placement and routing) of a re-compilation process. In these experiments a 12 × speedup was achieved when compared to fault injections based on serial fault emulation.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115256534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
An Approach to a Fully Automated Partial Reconfiguration Design Flow 一种全自动部分重构设计流程的方法
Kizheppatt Vipin, Suhaib A. Fahmy
Adoption of partial reconfiguration (PR) in mainstream FPGA system design remains underwhelming primarily due the significant FPGA design expertise that is required. We present an approach to fully automating a design flow that accepts a high level description of a dynamically adaptive application and generates a fully functional, optimised PR design. This tool can determine the most suitable FPGA for a design to meet a given reconfiguration time constraint and makes full use of available resources. The flow targets adaptive systems, where the dynamic behaviour and switching order are not known up front.
在主流FPGA系统设计中采用部分重构(PR)仍然不太令人印象深刻,主要是因为需要大量的FPGA设计专业知识。我们提出了一种完全自动化设计流程的方法,该流程接受动态自适应应用程序的高级描述,并生成功能齐全的优化公关设计。该工具可以为设计确定最合适的FPGA,以满足给定的重构时间限制,并充分利用可用资源。流的目标是自适应系统,其中动态行为和切换顺序事先不知道。
{"title":"An Approach to a Fully Automated Partial Reconfiguration Design Flow","authors":"Kizheppatt Vipin, Suhaib A. Fahmy","doi":"10.1109/FCCM.2013.33","DOIUrl":"https://doi.org/10.1109/FCCM.2013.33","url":null,"abstract":"Adoption of partial reconfiguration (PR) in mainstream FPGA system design remains underwhelming primarily due the significant FPGA design expertise that is required. We present an approach to fully automating a design flow that accepts a high level description of a dynamically adaptive application and generates a fully functional, optimised PR design. This tool can determine the most suitable FPGA for a design to meet a given reconfiguration time constraint and makes full use of available resources. The flow targets adaptive systems, where the dynamic behaviour and switching order are not known up front.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124215577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An FPGA-Based Data Flow Engine for Gaussian Copula Model 基于fpga的高斯Copula模型数据流引擎
Huabin Ruan, Xiaomeng Huang, H. Fu, Guangwen Yang, W. Luk, S. Racanière, O. Pell, Wenji Han
The Gaussian Copula Model (GCM) plays an important role in the state-of-the-art financial analysis field for modeling the dependence of financial assets. However, the existing implementations of GCM are all computationallydemanding and time-consuming. In this paper, we propose a Dataflow Engine (DFE) design to accelerate the GCM computation. Specifically, a commonly used CPU-friendly GCM algorithm is converted into a fully-pipelined dataflow graph through four steps of optimization: recomposing the algorithm to be pipeline-friendly, removing unnecessary computation, sharing common computing results, and reducing the computing precision while maintaining the same level of accuracy for the computation results. The performance of the proposed DFE design is compared with three CPU-based implementations that are well-optimized. Experimental results show that our DFE solution not only generates fairly accurate result, but also achieves a maximum of 467x speedup over a single-thread CPU-based solution, 120x speedup over a multi-thread CPUbased solution, and 47x speedup over an MPI-based solution.
高斯Copula模型(Gaussian Copula Model, GCM)用于对金融资产的依赖性进行建模,在当前金融分析领域中发挥着重要作用。然而,现有的GCM实现都是计算要求高且耗时的。在本文中,我们提出了一个数据流引擎(DFE)设计来加速GCM的计算。具体而言,将一种常用的cpu友好型GCM算法通过重组算法使其对管道友好、去除不必要的计算、共享公共计算结果、降低计算精度同时保持计算结果的相同精度四个优化步骤,转化为全流水线的数据流图。将所提出的DFE设计与三种优化良好的基于cpu的实现进行了性能比较。实验结果表明,我们的DFE解决方案不仅产生了相当准确的结果,而且在基于单线程cpu的解决方案上实现了467倍的加速,在基于多线程cpu的解决方案上实现了120倍的加速,在基于mpi的解决方案上实现了47倍的加速。
{"title":"An FPGA-Based Data Flow Engine for Gaussian Copula Model","authors":"Huabin Ruan, Xiaomeng Huang, H. Fu, Guangwen Yang, W. Luk, S. Racanière, O. Pell, Wenji Han","doi":"10.1109/FCCM.2013.14","DOIUrl":"https://doi.org/10.1109/FCCM.2013.14","url":null,"abstract":"The Gaussian Copula Model (GCM) plays an important role in the state-of-the-art financial analysis field for modeling the dependence of financial assets. However, the existing implementations of GCM are all computationallydemanding and time-consuming. In this paper, we propose a Dataflow Engine (DFE) design to accelerate the GCM computation. Specifically, a commonly used CPU-friendly GCM algorithm is converted into a fully-pipelined dataflow graph through four steps of optimization: recomposing the algorithm to be pipeline-friendly, removing unnecessary computation, sharing common computing results, and reducing the computing precision while maintaining the same level of accuracy for the computation results. The performance of the proposed DFE design is compared with three CPU-based implementations that are well-optimized. Experimental results show that our DFE solution not only generates fairly accurate result, but also achieves a maximum of 467x speedup over a single-thread CPU-based solution, 120x speedup over a multi-thread CPUbased solution, and 47x speedup over an MPI-based solution.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124959465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Multithreaded VLIW Soft Processor Family 一个多线程VLIW软处理器家族
Kalin Ovtcharov, Ilian Tili, J. Steffan
Summary form only given. There is growing commercial interest in using FPGAs for compute acceleration. To ease the programming task for non-hardware-expert programmers, systems are emerging that can map high-level languages such as C and OpenCL to FPGAs-targeting compiler-generated circuits and soft processing engines. Soft processing engines such as CPUs are familiar to programmers, can be reprogrammed quickly without rebuilding the FPGA image, and by their general nature can support multiple software functions in a smaller area than the alternative of multiple per-function synthesized circuits. Finally, compelling processing engines can be incorporated into the output of high-level synthesis systems. For FPGA-based soft compute engines to be compelling they must be computationally dense: they must achieve high throughput per area. For simple CPUs with simple functional units (FUs) it is relatively straightforward to achieve good utilization, and it is not overly-detrimental if a small, single-pipeline-stage FU such as an integer adder is under-utilized. In contrast, larger, more deeply pipelined, more numerous, and more varied FUs can be quite challenging to keep busy-even for an engine capable of extracting instruction-level parallelism (ILP) from an application. Hence a key challenge for FPGA-based compute engines is how to maximize compute density (throughput per-area) by achieving high utilization of a datapath composed of multiple varying FUs of significant and varying pipeline depth. In this work, we propose a highly-parameterizable template architecture of a multi-threaded FPGA-based compute engine designed to highly-utilize varied and deeply pipelined FUs. Our approach to achieving high utilization is to leverage (i) support for multiple thread contexts (ii) thread-level and instruction-level parallelism, and (iii) static compiler analysis and scheduling. We focus on deeply-pipelined, IEEE-754 floating-point FUs of widely-varying latency, executing both Hodgkin-Huxley neuron simulation and Black-Scholes options pricing models as example applications, compiled with our LLVM-based scheduler. Targeting a Stratix IV FPGA, we explore architectural tradeoffs by measuring area and throughput for designs with varying numbers of FUs, thread contexts (T), memory banks (B), and bank multi-porting. To determine the most efficient designs that would be suitable for replicating we measure compute density (application throughput per unit of FPGA area), and report which architectural choices lead to the most computationally-dense designs.The most computationally dense design is not necessarily the one with highest throughput and (i) for maximizing throughput, having each thread reside in its own bank is best; (ii) when only moderate numbers of independent threads are available, the compute engine has higher compute density than a custom hardware implementation eg., 2.3x for 32 threads; (iii) the best FU mix does not necessarily match the FU usage in th
只提供摘要形式。使用fpga进行计算加速的商业兴趣越来越大。为了减轻非硬件专家程序员的编程任务,可以将C和OpenCL等高级语言映射到针对fpga的编译器生成电路和软处理引擎的系统正在出现。像cpu这样的软处理引擎是程序员所熟悉的,可以在不重建FPGA映像的情况下快速重新编程,并且由于其一般性质,可以在比多个功能合成电路的替代方案更小的区域内支持多个软件功能。最后,引人注目的处理引擎可以被纳入高级合成系统的输出。为了使基于fpga的软计算引擎引人注目,它们必须计算密集:它们必须实现每个区域的高吞吐量。对于具有简单功能单元(FU)的简单cpu来说,实现良好的利用率是相对简单的,如果一个小的、单管道级的FU(如整数加法器)没有得到充分利用,也不会造成太大的损害。相比之下,更大、更深入的流水线化、更多数量和更多样化的FUs可能非常难以保持忙碌——即使对于能够从应用程序中提取指令级并行性(ILP)的引擎也是如此。因此,基于fpga的计算引擎面临的一个关键挑战是,如何通过实现由多个具有显著和不同管道深度的不同FUs组成的数据路径的高利用率来最大化计算密度(每区域的吞吐量)。在这项工作中,我们提出了一个基于fpga的多线程计算引擎的高度可参数化的模板架构,旨在高度利用各种深度流水线化的FUs。我们实现高利用率的方法是利用(i)对多线程上下文的支持(ii)线程级和指令级并行性,以及(iii)静态编译器分析和调度。我们专注于深度流水线,IEEE-754浮点FUs具有广泛的延迟,执行Hodgkin-Huxley神经元模拟和Black-Scholes期权定价模型作为示例应用程序,使用我们基于llvm的调度程序编译。针对Stratix IV FPGA,我们通过测量具有不同数量的FUs,线程上下文(T),内存库(B)和银行多端口的设计的面积和吞吐量来探索架构权衡。为了确定适合复制的最有效的设计,我们测量了计算密度(FPGA面积单位的应用程序吞吐量),并报告了哪些架构选择导致了最计算密度的设计。计算密度最高的设计不一定是具有最高吞吐量的设计,并且(i)为了最大限度地提高吞吐量,让每个线程驻留在自己的线程组中是最好的;(ii)当只有中等数量的独立线程可用时,计算引擎比自定义硬件实现具有更高的计算密度。, 2.3倍,32线程;(iii)最佳的傅里叶混合不一定符合应用程序数据流图中的傅里叶使用情况;(四)建筑参数。
{"title":"A Multithreaded VLIW Soft Processor Family","authors":"Kalin Ovtcharov, Ilian Tili, J. Steffan","doi":"10.1109/FCCM.2013.36","DOIUrl":"https://doi.org/10.1109/FCCM.2013.36","url":null,"abstract":"Summary form only given. There is growing commercial interest in using FPGAs for compute acceleration. To ease the programming task for non-hardware-expert programmers, systems are emerging that can map high-level languages such as C and OpenCL to FPGAs-targeting compiler-generated circuits and soft processing engines. Soft processing engines such as CPUs are familiar to programmers, can be reprogrammed quickly without rebuilding the FPGA image, and by their general nature can support multiple software functions in a smaller area than the alternative of multiple per-function synthesized circuits. Finally, compelling processing engines can be incorporated into the output of high-level synthesis systems. For FPGA-based soft compute engines to be compelling they must be computationally dense: they must achieve high throughput per area. For simple CPUs with simple functional units (FUs) it is relatively straightforward to achieve good utilization, and it is not overly-detrimental if a small, single-pipeline-stage FU such as an integer adder is under-utilized. In contrast, larger, more deeply pipelined, more numerous, and more varied FUs can be quite challenging to keep busy-even for an engine capable of extracting instruction-level parallelism (ILP) from an application. Hence a key challenge for FPGA-based compute engines is how to maximize compute density (throughput per-area) by achieving high utilization of a datapath composed of multiple varying FUs of significant and varying pipeline depth. In this work, we propose a highly-parameterizable template architecture of a multi-threaded FPGA-based compute engine designed to highly-utilize varied and deeply pipelined FUs. Our approach to achieving high utilization is to leverage (i) support for multiple thread contexts (ii) thread-level and instruction-level parallelism, and (iii) static compiler analysis and scheduling. We focus on deeply-pipelined, IEEE-754 floating-point FUs of widely-varying latency, executing both Hodgkin-Huxley neuron simulation and Black-Scholes options pricing models as example applications, compiled with our LLVM-based scheduler. Targeting a Stratix IV FPGA, we explore architectural tradeoffs by measuring area and throughput for designs with varying numbers of FUs, thread contexts (T), memory banks (B), and bank multi-porting. To determine the most efficient designs that would be suitable for replicating we measure compute density (application throughput per unit of FPGA area), and report which architectural choices lead to the most computationally-dense designs.The most computationally dense design is not necessarily the one with highest throughput and (i) for maximizing throughput, having each thread reside in its own bank is best; (ii) when only moderate numbers of independent threads are available, the compute engine has higher compute density than a custom hardware implementation eg., 2.3x for 32 threads; (iii) the best FU mix does not necessarily match the FU usage in th","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133863245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1