HAWK: Hardware support for unstructured log processing

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI:10.1109/ICDE.2016.7498263

Prateek Tandon, Faissal M. Sleiman, Michael J. Cafarella, T. Wenisch

{"title":"HAWK: Hardware support for unstructured log processing","authors":"Prateek Tandon, Faissal M. Sleiman, Michael J. Cafarella, T. Wenisch","doi":"10.1109/ICDE.2016.7498263","DOIUrl":null,"url":null,"abstract":"Rapidly processing high-velocity text data is critical for many technical and business applications. Widely used software solutions for processing these large text corpora target disk-resident data and rely on pre-computed indexes and large clusters to achieve high performance. However, greater capacity and falling costs are enabling a shift to RAM-resident data sets. The enormous bandwidth of RAM can facilitate scan operations that are competitive with pre-computed indexes for interactive, ad-hoc queries. However, software approaches for processing these large text corpora fall far short of saturating available bandwidth and meeting peak scan rates possible on modern memory systems. In this paper, we present HAWK, a hardware accelerator for ad hoc queries against large in-memory logs. HAWK comprises a stall-free hardware pipeline that scans input data at a constant rate, examining multiple input characters in parallel during a single accelerator clock cycle. We describe a 1GHz 32-characterwide HAWK design targeting ASIC implementation, designed to process data at 32GB/s (up to two orders of magnitude faster than software solutions), and demonstrate a scaled-down FPGA prototype that operates at 100MHz with 4-wide parallelism, which processes at 400MB/s (13× faster than software grep for large multi-pattern scans).","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"4 4 1","pages":"469-480"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2016.7498263","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 22

Abstract

Rapidly processing high-velocity text data is critical for many technical and business applications. Widely used software solutions for processing these large text corpora target disk-resident data and rely on pre-computed indexes and large clusters to achieve high performance. However, greater capacity and falling costs are enabling a shift to RAM-resident data sets. The enormous bandwidth of RAM can facilitate scan operations that are competitive with pre-computed indexes for interactive, ad-hoc queries. However, software approaches for processing these large text corpora fall far short of saturating available bandwidth and meeting peak scan rates possible on modern memory systems. In this paper, we present HAWK, a hardware accelerator for ad hoc queries against large in-memory logs. HAWK comprises a stall-free hardware pipeline that scans input data at a constant rate, examining multiple input characters in parallel during a single accelerator clock cycle. We describe a 1GHz 32-characterwide HAWK design targeting ASIC implementation, designed to process data at 32GB/s (up to two orders of magnitude faster than software solutions), and demonstrate a scaled-down FPGA prototype that operates at 100MHz with 4-wide parallelism, which processes at 400MB/s (13× faster than software grep for large multi-pattern scans).

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

HAWK:非结构化日志处理的硬件支持

快速处理高速文本数据对于许多技术和业务应用程序至关重要。用于处理这些大型文本语料库的广泛使用的软件解决方案以磁盘驻留数据为目标，并依赖于预先计算的索引和大型集群来实现高性能。然而，更大的容量和不断下降的成本正在推动向驻留在ram上的数据集的转变。RAM的巨大带宽可以促进扫描操作，这些操作可以与交互式临时查询的预计算索引相竞争。然而，处理这些大型文本语料库的软件方法远远不能达到饱和可用带宽和满足现代存储系统上可能的峰值扫描速率。在本文中，我们介绍了HAWK，这是一个硬件加速器，用于针对大型内存日志进行临时查询。HAWK包括一个无失速硬件管道，以恒定速率扫描输入数据，在单个加速器时钟周期内并行检查多个输入字符。我们描述了一个针对ASIC实现的1GHz 32字符宽HAWK设计，旨在以32GB/s的速度处理数据(比软件解决方案快两个数量级)，并演示了一个按比例缩小的FPGA原型，其工作频率为100MHz，并行度为4-wide，处理速度为400MB/s(比软件grep快13倍，用于大型多模式扫描)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2016 IEEE 32nd International Conference on Data Engineering (ICDE)

自引率

0.00%

发文量

期刊最新文献

Data profiling SEED: A system for entity exploration and debugging in large-scale knowledge graphs TemProRA: Top-k temporal-probabilistic results analysis Durable graph pattern queries on historical graphs SCouT: Scalable coupled matrix-tensor factorization - algorithm and discoveries