首页 > 最新文献

2011 24th Internatioal Conference on VLSI Design最新文献

英文 中文
A Scalable LDPC Decoder on GPU 基于GPU的可扩展LDPC解码器
Pub Date : 2011-01-02 DOI: 10.1109/VLSID.2011.44
K. Abburi
A flexible and scalable approach for LDPC decodingon CUDA based Graphics Processing Unit (GPU) is presented in this paper. Layered decoding is a popular method for LDPC decoding and is known for its fast convergence. However, efficient implementation of the layered decoding algorithm on GPU is challenging due to the limited amount of data-parallelism available in this algorithm. To overcome this problem, a kernel execution configuration that can decode multiple codewords simultaneously on GPU is developed. This paper proposes a compact data packing scheme to reduce the number of global memory accesses and parity-check matrix representation to reduce constant memory latency. Global memory bandwidth efficiency is improved by coalescing simultaneous memory accesses of threads in a half-warp into a single memory transaction. Asynchronous data transfers are used to hide host memory latency by overlapping kernel execution with data transfers between CPU and GPU. The proposed implementation of LDPC decoder on GPU performs two orders of magnitude faster than the LDPC decoder on a CPU and four times faster than the previously reported LDPC decoder on GPU. This implementation achieves a throughput of 160Mbps, which is comparable to dedicated hardware solutions.
本文提出了一种在基于CUDA的图形处理单元(GPU)上进行LDPC解码的灵活且可扩展的方法。分层译码是LDPC译码的一种常用方法,具有快速收敛的特点。然而,由于该算法中可用的数据并行性有限,因此在GPU上有效实现分层解码算法具有挑战性。为了解决这一问题,开发了一种可以在GPU上同时解码多个码字的内核执行配置。本文提出了一种紧凑的数据打包方案,以减少全局内存访问次数,并提出了奇偶校验矩阵表示,以减少恒定的内存延迟。全局内存带宽效率是通过将半曲线程的并发内存访问合并到单个内存事务中来提高的。异步数据传输通过在CPU和GPU之间的数据传输重叠内核执行来隐藏主机内存延迟。提出的LDPC解码器在GPU上的实现比CPU上的LDPC解码器快两个数量级,比以前报道的GPU上的LDPC解码器快4倍。该实现实现了160Mbps的吞吐量,与专用硬件解决方案相当。
{"title":"A Scalable LDPC Decoder on GPU","authors":"K. Abburi","doi":"10.1109/VLSID.2011.44","DOIUrl":"https://doi.org/10.1109/VLSID.2011.44","url":null,"abstract":"A flexible and scalable approach for LDPC decodingon CUDA based Graphics Processing Unit (GPU) is presented in this paper. Layered decoding is a popular method for LDPC decoding and is known for its fast convergence. However, efficient implementation of the layered decoding algorithm on GPU is challenging due to the limited amount of data-parallelism available in this algorithm. To overcome this problem, a kernel execution configuration that can decode multiple codewords simultaneously on GPU is developed. This paper proposes a compact data packing scheme to reduce the number of global memory accesses and parity-check matrix representation to reduce constant memory latency. Global memory bandwidth efficiency is improved by coalescing simultaneous memory accesses of threads in a half-warp into a single memory transaction. Asynchronous data transfers are used to hide host memory latency by overlapping kernel execution with data transfers between CPU and GPU. The proposed implementation of LDPC decoder on GPU performs two orders of magnitude faster than the LDPC decoder on a CPU and four times faster than the previously reported LDPC decoder on GPU. This implementation achieves a throughput of 160Mbps, which is comparable to dedicated hardware solutions.","PeriodicalId":371062,"journal":{"name":"2011 24th Internatioal Conference on VLSI Design","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127500804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
A Novel Learning Framework for State Space Exploration Based on Search State Extensibility Relation 一种基于搜索状态可拓关系的状态空间探索学习框架
Pub Date : 2011-01-02 DOI: 10.1109/VLSID.2011.57
M. Chandrasekar, M. Hsiao
Model Checking is an effective method for design verification, useful for proving temporal properties of the underlying system. In model checking, computing the pre-image (or image) space of a given temporal property plays a critical role. In this paper, we propose a novel learning framework for efficient state space exploration based on search state extensibility relation. This allows for the identification and pruning of several non-trivial redundant search spaces, thereby reducing the computational cost. We also propose a probability-based heuristic to guide our learning method. Experimental evidence is given to show the practicality of the proposed method.
模型检验是一种有效的设计验证方法,有助于验证底层系统的时间特性。在模型检查中,计算给定时间属性的预像(或图像)空间起着至关重要的作用。本文提出了一种基于搜索状态可拓关系的高效状态空间探索学习框架。这允许识别和修剪几个重要的冗余搜索空间,从而降低计算成本。我们还提出了一个基于概率的启发式方法来指导我们的学习方法。实验证明了该方法的实用性。
{"title":"A Novel Learning Framework for State Space Exploration Based on Search State Extensibility Relation","authors":"M. Chandrasekar, M. Hsiao","doi":"10.1109/VLSID.2011.57","DOIUrl":"https://doi.org/10.1109/VLSID.2011.57","url":null,"abstract":"Model Checking is an effective method for design verification, useful for proving temporal properties of the underlying system. In model checking, computing the pre-image (or image) space of a given temporal property plays a critical role. In this paper, we propose a novel learning framework for efficient state space exploration based on search state extensibility relation. This allows for the identification and pruning of several non-trivial redundant search spaces, thereby reducing the computational cost. We also propose a probability-based heuristic to guide our learning method. Experimental evidence is given to show the practicality of the proposed method.","PeriodicalId":371062,"journal":{"name":"2011 24th Internatioal Conference on VLSI Design","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125653769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Approach to Tolerate Process Related Variations in Memristor-Based Applications 在基于忆阻器的应用中容忍过程相关变化的方法
Pub Date : 2011-01-02 DOI: 10.1109/VLSID.2011.49
Jeyavijayan Rajendran, H. Manem, R. Karri, G. Rose
Memristors have been proposed to be used in a wide variety of applications ranging from neural networks to memory to digital logic. Like other electronic devices, memristors are also prone to process variations. We show that the effect of process induced variations in the thickness of the oxide layer of a memristor has a non-linear relationship with memristance. We analyze the effects of process variation on memristor-based threshold gates. We propose two algorithms to tolerate variations on memristance based on two different constraints. One is used to determine the memristance values for a given list of Boolean functions to tolerate a maximum amount of variation. The other is used to determine the list of Boolean functions that can tolerate a maximum amount of variation for given memristance values. Finally, we analyze the performance of memristor-based threshold gates to tolerate variations.
记忆电阻器已被广泛应用于从神经网络到存储器到数字逻辑的各种应用中。像其他电子设备一样,忆阻器也容易发生工艺变化。我们证明了工艺引起的忆阻器氧化层厚度变化的影响与忆阻呈非线性关系。我们分析了工艺变化对基于忆阻器的阈值门的影响。我们提出了两种算法来容忍基于两种不同约束的记忆电阻变化。一个用于确定给定布尔函数列表的忆阻值,以允许最大数量的变化。另一个用于确定布尔函数的列表,这些布尔函数可以容忍给定电阻值的最大变化量。最后,我们分析了基于忆阻器的阈值门容忍变化的性能。
{"title":"An Approach to Tolerate Process Related Variations in Memristor-Based Applications","authors":"Jeyavijayan Rajendran, H. Manem, R. Karri, G. Rose","doi":"10.1109/VLSID.2011.49","DOIUrl":"https://doi.org/10.1109/VLSID.2011.49","url":null,"abstract":"Memristors have been proposed to be used in a wide variety of applications ranging from neural networks to memory to digital logic. Like other electronic devices, memristors are also prone to process variations. We show that the effect of process induced variations in the thickness of the oxide layer of a memristor has a non-linear relationship with memristance. We analyze the effects of process variation on memristor-based threshold gates. We propose two algorithms to tolerate variations on memristance based on two different constraints. One is used to determine the memristance values for a given list of Boolean functions to tolerate a maximum amount of variation. The other is used to determine the list of Boolean functions that can tolerate a maximum amount of variation for given memristance values. Finally, we analyze the performance of memristor-based threshold gates to tolerate variations.","PeriodicalId":371062,"journal":{"name":"2011 24th Internatioal Conference on VLSI Design","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124519139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
AcENoCs: A Configurable HW/SW Platform for FPGA Accelerated NoC Emulation AcENoCs: FPGA加速NoC仿真的可配置软硬件平台
Pub Date : 2011-01-01 DOI: 10.1109/VLSID.2011.46
Swapnil Lotlikar, Vinay S. Pai, Paul V. Gratz
he heterogeneous nature of the modern day applications has resulted in widespread use of Multicore SoC architectures. The emerging Network-On-Chip (NoC) interconnect architecture provides an energy-efficient and scalable communication solution for multiple cores, serving as a powerful replacement for traditional bus architectures. The key to the successful realization of such architectures is a flexible, fast and robust emulation platform. This paper presents the design, implementation and evaluation of AcENoCs, a flexible and cycle-accurate FPGA emulation platform for validating synchronous and GALS-based NoC architectures. The emulation platform is built around a HWSW framework consisting of reconfigurable network components, traffic generators and ejectors, statistics collection and analysis modules. We also address the unique features of our platform in terms of reconfigurability and co-design of the hardware and software components, and assess the performance improvements and tradeoffs over existing PGA emulators and software simulators. Our experimental analysis indicate speedup improvements in the order of 10000-12000X over HDL simulators and 14-47X over software simulators, without sacrificing cycle accuracy.
现代应用的异构特性导致了多核SoC架构的广泛使用。新兴的片上网络(NoC)互连架构为多核心提供了一种节能且可扩展的通信解决方案,可以作为传统总线架构的强大替代品。灵活、快速、鲁棒的仿真平台是实现这种体系结构的关键。本文介绍了AcENoCs的设计、实现和评估,AcENoCs是一个灵活的、周期精确的FPGA仿真平台,用于验证同步和基于gals的NoC架构。仿真平台是围绕HWSW框架构建的,该框架由可重构网络组件、流量生成和输出、统计收集和分析模块组成。我们还在硬件和软件组件的可重构性和协同设计方面解决了我们平台的独特功能,并评估了现有PGA模拟器和软件模拟器的性能改进和权衡。我们的实验分析表明,在不牺牲周期精度的情况下,与HDL模拟器相比,加速提高了10,000 - 12000x,与软件模拟器相比,加速提高了14-47X。
{"title":"AcENoCs: A Configurable HW/SW Platform for FPGA Accelerated NoC Emulation","authors":"Swapnil Lotlikar, Vinay S. Pai, Paul V. Gratz","doi":"10.1109/VLSID.2011.46","DOIUrl":"https://doi.org/10.1109/VLSID.2011.46","url":null,"abstract":"he heterogeneous nature of the modern day applications has resulted in widespread use of Multicore SoC architectures. The emerging Network-On-Chip (NoC) interconnect architecture provides an energy-efficient and scalable communication solution for multiple cores, serving as a powerful replacement for traditional bus architectures. The key to the successful realization of such architectures is a flexible, fast and robust emulation platform. This paper presents the design, implementation and evaluation of AcENoCs, a flexible and cycle-accurate FPGA emulation platform for validating synchronous and GALS-based NoC architectures. The emulation platform is built around a HWSW framework consisting of reconfigurable network components, traffic generators and ejectors, statistics collection and analysis modules. We also address the unique features of our platform in terms of reconfigurability and co-design of the hardware and software components, and assess the performance improvements and tradeoffs over existing PGA emulators and software simulators. Our experimental analysis indicate speedup improvements in the order of 10000-12000X over HDL simulators and 14-47X over software simulators, without sacrificing cycle accuracy.","PeriodicalId":371062,"journal":{"name":"2011 24th Internatioal Conference on VLSI Design","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126441673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
期刊
2011 24th Internatioal Conference on VLSI Design
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1