用fpga加速采样排序大型数据集

Han Chen, S. Madaminov, M. Ferdman, Peter Milder
{"title":"用fpga加速采样排序大型数据集","authors":"Han Chen, S. Madaminov, M. Ferdman, Peter Milder","doi":"10.1109/FCCM.2019.00067","DOIUrl":null,"url":null,"abstract":"Sorting is a fundamental operation in many applications such as databases, search, and social networks. Although FPGAs have been shown effective at sorting data sizes that fit on chip, systems that sort larger data sets by shuffling data on and off chip are typically bottlenecked by costly merge operations or data transfer time. We propose a new approach to sorting large data sets by accelerating the samplesort algorithm using a server with a PCIe-connected FPGA. Samplesort works by randomly sampling to determine how to partition data into approximately equal-sized non-overlapping \"buckets,\" sorting each bucket, and concatenating the results. Although samplesort can partition a large problem into smaller ones that fit in the FPGA's on-chip memory, partitioning in software is slow. Our system uses a novel parallel hardware partitioner that is only limited in data set size by available FPGA hardware resources. After partitioning, each bucket is sorted using parallel sorting hardware. The CPU is responsible for sampling data, cleaning up any potential problems caused by variation in bucket size, and providing scalability by performing an initial coarse-grained partitioning when the input set is larger than the FPGA can sort. We prototype our design using Amazon Web Services FPGA instances, which pair a Xilinx Virtex UltraScale+ FPGA with a high-performance server. Our experiments demonstrate a 17.1x speedup over GNU parallel sort when sorting 2^23 key-value records and a speedup of 4.2x when sorting 2^30 records.","PeriodicalId":116955,"journal":{"name":"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Sorting Large Data Sets with FPGA-Accelerated Samplesort\",\"authors\":\"Han Chen, S. Madaminov, M. Ferdman, Peter Milder\",\"doi\":\"10.1109/FCCM.2019.00067\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sorting is a fundamental operation in many applications such as databases, search, and social networks. Although FPGAs have been shown effective at sorting data sizes that fit on chip, systems that sort larger data sets by shuffling data on and off chip are typically bottlenecked by costly merge operations or data transfer time. We propose a new approach to sorting large data sets by accelerating the samplesort algorithm using a server with a PCIe-connected FPGA. Samplesort works by randomly sampling to determine how to partition data into approximately equal-sized non-overlapping \\\"buckets,\\\" sorting each bucket, and concatenating the results. Although samplesort can partition a large problem into smaller ones that fit in the FPGA's on-chip memory, partitioning in software is slow. Our system uses a novel parallel hardware partitioner that is only limited in data set size by available FPGA hardware resources. After partitioning, each bucket is sorted using parallel sorting hardware. The CPU is responsible for sampling data, cleaning up any potential problems caused by variation in bucket size, and providing scalability by performing an initial coarse-grained partitioning when the input set is larger than the FPGA can sort. We prototype our design using Amazon Web Services FPGA instances, which pair a Xilinx Virtex UltraScale+ FPGA with a high-performance server. Our experiments demonstrate a 17.1x speedup over GNU parallel sort when sorting 2^23 key-value records and a speedup of 4.2x when sorting 2^30 records.\",\"PeriodicalId\":116955,\"journal\":{\"name\":\"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FCCM.2019.00067\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2019.00067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

排序是许多应用程序(如数据库、搜索和社交网络)中的基本操作。尽管fpga在对芯片上的数据大小进行排序方面已经被证明是有效的,但通过在芯片上和芯片外变换数据来对更大的数据集进行排序的系统通常会受到昂贵的合并操作或数据传输时间的瓶颈。我们提出了一种新的方法,通过使用带有pcie连接的FPGA的服务器加速采样排序算法来对大型数据集进行排序。Samplesort通过随机抽样来确定如何将数据划分为大小大致相等且不重叠的“桶”,对每个桶进行排序,并将结果连接起来。尽管samplesort可以将一个大问题划分为适合FPGA片上内存的小问题,但在软件中划分速度很慢。我们的系统使用了一种新型的并行硬件分区器,它只受可用FPGA硬件资源的限制而限制数据集的大小。分区后,使用并行排序硬件对每个桶进行排序。CPU负责对数据进行采样,清除由存储桶大小变化引起的任何潜在问题,并在输入集大于FPGA可以排序时通过执行初始粗粒度分区来提供可伸缩性。我们使用Amazon Web Services FPGA实例进行原型设计,该实例将Xilinx Virtex UltraScale+ FPGA与高性能服务器配对。我们的实验表明,当排序2^23条键值记录时,比GNU并行排序加快17.1倍,排序2^30条记录时加快4.2倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Sorting Large Data Sets with FPGA-Accelerated Samplesort
Sorting is a fundamental operation in many applications such as databases, search, and social networks. Although FPGAs have been shown effective at sorting data sizes that fit on chip, systems that sort larger data sets by shuffling data on and off chip are typically bottlenecked by costly merge operations or data transfer time. We propose a new approach to sorting large data sets by accelerating the samplesort algorithm using a server with a PCIe-connected FPGA. Samplesort works by randomly sampling to determine how to partition data into approximately equal-sized non-overlapping "buckets," sorting each bucket, and concatenating the results. Although samplesort can partition a large problem into smaller ones that fit in the FPGA's on-chip memory, partitioning in software is slow. Our system uses a novel parallel hardware partitioner that is only limited in data set size by available FPGA hardware resources. After partitioning, each bucket is sorted using parallel sorting hardware. The CPU is responsible for sampling data, cleaning up any potential problems caused by variation in bucket size, and providing scalability by performing an initial coarse-grained partitioning when the input set is larger than the FPGA can sort. We prototype our design using Amazon Web Services FPGA instances, which pair a Xilinx Virtex UltraScale+ FPGA with a high-performance server. Our experiments demonstrate a 17.1x speedup over GNU parallel sort when sorting 2^23 key-value records and a speedup of 4.2x when sorting 2^30 records.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Hardware Acceleration of Long Read Pairwise Overlapping in Genome Sequencing: A Race Between FPGA and GPU MEG: A RISCV-Based System Simulation Infrastructure for Exploring Memory Optimization Using FPGAs and Hybrid Memory Cube π-BA: Bundle Adjustment Acceleration on Embedded FPGAs with Co-observation Optimization Safe Task Interruption for FPGAs Analyzing the Energy-Efficiency of Vision Kernels on Embedded CPU, GPU and FPGA Platforms
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1