2011 24th Internatioal Conference on VLSI Design最新文献

英文中文

A Scalable LDPC Decoder on GPU 基于GPU的可扩展LDPC解码器

2011 24th Internatioal Conference on VLSI Design

Pub Date : 2011-01-02 DOI: 10.1109/VLSID.2011.44

K. Abburi

A flexible and scalable approach for LDPC decodingon CUDA based Graphics Processing Unit (GPU) is presented in this paper. Layered decoding is a popular method for LDPC decoding and is known for its fast convergence. However, efficient implementation of the layered decoding algorithm on GPU is challenging due to the limited amount of data-parallelism available in this algorithm. To overcome this problem, a kernel execution configuration that can decode multiple codewords simultaneously on GPU is developed. This paper proposes a compact data packing scheme to reduce the number of global memory accesses and parity-check matrix representation to reduce constant memory latency. Global memory bandwidth efficiency is improved by coalescing simultaneous memory accesses of threads in a half-warp into a single memory transaction. Asynchronous data transfers are used to hide host memory latency by overlapping kernel execution with data transfers between CPU and GPU. The proposed implementation of LDPC decoder on GPU performs two orders of magnitude faster than the LDPC decoder on a CPU and four times faster than the previously reported LDPC decoder on GPU. This implementation achieves a throughput of 160Mbps, which is comparable to dedicated hardware solutions.

本文提出了一种在基于CUDA的图形处理单元(GPU)上进行LDPC解码的灵活且可扩展的方法。分层译码是LDPC译码的一种常用方法，具有快速收敛的特点。然而，由于该算法中可用的数据并行性有限，因此在GPU上有效实现分层解码算法具有挑战性。为了解决这一问题，开发了一种可以在GPU上同时解码多个码字的内核执行配置。本文提出了一种紧凑的数据打包方案，以减少全局内存访问次数，并提出了奇偶校验矩阵表示，以减少恒定的内存延迟。全局内存带宽效率是通过将半曲线程的并发内存访问合并到单个内存事务中来提高的。异步数据传输通过在CPU和GPU之间的数据传输重叠内核执行来隐藏主机内存延迟。提出的LDPC解码器在GPU上的实现比CPU上的LDPC解码器快两个数量级，比以前报道的GPU上的LDPC解码器快4倍。该实现实现了160Mbps的吞吐量，与专用硬件解决方案相当。

{"title":"A Scalable LDPC Decoder on GPU","authors":"K. Abburi","doi":"10.1109/VLSID.2011.44","DOIUrl":"https://doi.org/10.1109/VLSID.2011.44","url":null,"abstract":"A flexible and scalable approach for LDPC decodingon CUDA based Graphics Processing Unit (GPU) is presented in this paper. Layered decoding is a popular method for LDPC decoding and is known for its fast convergence. However, efficient implementation of the layered decoding algorithm on GPU is challenging due to the limited amount of data-parallelism available in this algorithm. To overcome this problem, a kernel execution configuration that can decode multiple codewords simultaneously on GPU is developed. This paper proposes a compact data packing scheme to reduce the number of global memory accesses and parity-check matrix representation to reduce constant memory latency. Global memory bandwidth efficiency is improved by coalescing simultaneous memory accesses of threads in a half-warp into a single memory transaction. Asynchronous data transfers are used to hide host memory latency by overlapping kernel execution with data transfers between CPU and GPU. The proposed implementation of LDPC decoder on GPU performs two orders of magnitude faster than the LDPC decoder on a CPU and four times faster than the previously reported LDPC decoder on GPU. This implementation achieves a throughput of 160Mbps, which is comparable to dedicated hardware solutions.","PeriodicalId":371062,"journal":{"name":"2011 24th Internatioal Conference on VLSI Design","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127500804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

A Novel Learning Framework for State Space Exploration Based on Search State Extensibility Relation 一种基于搜索状态可拓关系的状态空间探索学习框架

2011 24th Internatioal Conference on VLSI Design

Pub Date : 2011-01-02 DOI: 10.1109/VLSID.2011.57

M. Chandrasekar, M. Hsiao

Model Checking is an effective method for design verification, useful for proving temporal properties of the underlying system. In model checking, computing the pre-image (or image) space of a given temporal property plays a critical role. In this paper, we propose a novel learning framework for efficient state space exploration based on search state extensibility relation. This allows for the identification and pruning of several non-trivial redundant search spaces, thereby reducing the computational cost. We also propose a probability-based heuristic to guide our learning method. Experimental evidence is given to show the practicality of the proposed method.

模型检验是一种有效的设计验证方法，有助于验证底层系统的时间特性。在模型检查中，计算给定时间属性的预像(或图像)空间起着至关重要的作用。本文提出了一种基于搜索状态可拓关系的高效状态空间探索学习框架。这允许识别和修剪几个重要的冗余搜索空间，从而降低计算成本。我们还提出了一个基于概率的启发式方法来指导我们的学习方法。实验证明了该方法的实用性。

引用次数: 1

An Approach to Tolerate Process Related Variations in Memristor-Based Applications 在基于忆阻器的应用中容忍过程相关变化的方法

2011 24th Internatioal Conference on VLSI Design

Pub Date : 2011-01-02 DOI: 10.1109/VLSID.2011.49

Jeyavijayan Rajendran, H. Manem, R. Karri, G. Rose

Memristors have been proposed to be used in a wide variety of applications ranging from neural networks to memory to digital logic. Like other electronic devices, memristors are also prone to process variations. We show that the effect of process induced variations in the thickness of the oxide layer of a memristor has a non-linear relationship with memristance. We analyze the effects of process variation on memristor-based threshold gates. We propose two algorithms to tolerate variations on memristance based on two different constraints. One is used to determine the memristance values for a given list of Boolean functions to tolerate a maximum amount of variation. The other is used to determine the list of Boolean functions that can tolerate a maximum amount of variation for given memristance values. Finally, we analyze the performance of memristor-based threshold gates to tolerate variations.

记忆电阻器已被广泛应用于从神经网络到存储器到数字逻辑的各种应用中。像其他电子设备一样，忆阻器也容易发生工艺变化。我们证明了工艺引起的忆阻器氧化层厚度变化的影响与忆阻呈非线性关系。我们分析了工艺变化对基于忆阻器的阈值门的影响。我们提出了两种算法来容忍基于两种不同约束的记忆电阻变化。一个用于确定给定布尔函数列表的忆阻值，以允许最大数量的变化。另一个用于确定布尔函数的列表，这些布尔函数可以容忍给定电阻值的最大变化量。最后，我们分析了基于忆阻器的阈值门容忍变化的性能。

引用次数: 40

AcENoCs: A Configurable HW/SW Platform for FPGA Accelerated NoC Emulation AcENoCs: FPGA加速NoC仿真的可配置软硬件平台

2011 24th Internatioal Conference on VLSI Design

Pub Date : 2011-01-01 DOI: 10.1109/VLSID.2011.46

Swapnil Lotlikar, Vinay S. Pai, Paul V. Gratz

he heterogeneous nature of the modern day applications has resulted in widespread use of Multicore SoC architectures. The emerging Network-On-Chip (NoC) interconnect architecture provides an energy-efficient and scalable communication solution for multiple cores, serving as a powerful replacement for traditional bus architectures. The key to the successful realization of such architectures is a flexible, fast and robust emulation platform. This paper presents the design, implementation and evaluation of AcENoCs, a flexible and cycle-accurate FPGA emulation platform for validating synchronous and GALS-based NoC architectures. The emulation platform is built around a HWSW framework consisting of reconfigurable network components, traffic generators and ejectors, statistics collection and analysis modules. We also address the unique features of our platform in terms of reconfigurability and co-design of the hardware and software components, and assess the performance improvements and tradeoffs over existing PGA emulators and software simulators. Our experimental analysis indicate speedup improvements in the order of 10000-12000X over HDL simulators and 14-47X over software simulators, without sacrificing cycle accuracy.

现代应用的异构特性导致了多核SoC架构的广泛使用。新兴的片上网络(NoC)互连架构为多核心提供了一种节能且可扩展的通信解决方案，可以作为传统总线架构的强大替代品。灵活、快速、鲁棒的仿真平台是实现这种体系结构的关键。本文介绍了AcENoCs的设计、实现和评估，AcENoCs是一个灵活的、周期精确的FPGA仿真平台，用于验证同步和基于gals的NoC架构。仿真平台是围绕HWSW框架构建的，该框架由可重构网络组件、流量生成和输出、统计收集和分析模块组成。我们还在硬件和软件组件的可重构性和协同设计方面解决了我们平台的独特功能，并评估了现有PGA模拟器和软件模拟器的性能改进和权衡。我们的实验分析表明，在不牺牲周期精度的情况下，与HDL模拟器相比，加速提高了10,000 - 12000x，与软件模拟器相比，加速提高了14-47X。

{"title":"AcENoCs: A Configurable HW/SW Platform for FPGA Accelerated NoC Emulation","authors":"Swapnil Lotlikar, Vinay S. Pai, Paul V. Gratz","doi":"10.1109/VLSID.2011.46","DOIUrl":"https://doi.org/10.1109/VLSID.2011.46","url":null,"abstract":"he heterogeneous nature of the modern day applications has resulted in widespread use of Multicore SoC architectures. The emerging Network-On-Chip (NoC) interconnect architecture provides an energy-efficient and scalable communication solution for multiple cores, serving as a powerful replacement for traditional bus architectures. The key to the successful realization of such architectures is a flexible, fast and robust emulation platform. This paper presents the design, implementation and evaluation of AcENoCs, a flexible and cycle-accurate FPGA emulation platform for validating synchronous and GALS-based NoC architectures. The emulation platform is built around a HWSW framework consisting of reconfigurable network components, traffic generators and ejectors, statistics collection and analysis modules. We also address the unique features of our platform in terms of reconfigurability and co-design of the hardware and software components, and assess the performance improvements and tradeoffs over existing PGA emulators and software simulators. Our experimental analysis indicate speedup improvements in the order of 10000-12000X over HDL simulators and 14-47X over software simulators, without sacrificing cycle accuracy.","PeriodicalId":371062,"journal":{"name":"2011 24th Internatioal Conference on VLSI Design","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126441673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2011 24th Internatioal Conference on VLSI Design

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀