CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding

arXiv - CS - Databases Pub Date : 2024-08-08 DOI:arxiv-2408.04678

Sophia Ho, Jinsol Park, Patrick Wang

引用次数: 0

Abstract

We present CREST (Compact Retrieval-Based Speculative Decoding), a redesign of REST that allows it to be effectively "compacted". REST is a drafting technique for speculative decoding based on retrieving exact n-gram matches of the most recent n tokens generated by the target LLM from a datastore. The key idea of CREST is to only store a subset of the smallest and most common n-grams in the datastore with the hope of achieving comparable performance with less storage space. We found that storing a subset of n-grams both reduces storage space and improves performance. CREST matches REST's accepted token length with 10.6-13.5x less storage space and achieves a 16.5-17.1% higher acceptance length than REST using the same storage space on the HumanEval and MT Bench benchmarks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CREST：有效压缩数据存储，实现基于检索的推测性解码

我们提出了 CREST（基于紧凑检索的推测性解码），它是对 REST 的重新设计，可以有效地将其 "紧凑化"。REST 是一种用于推测解码的起草技术，它基于从数据存储中检索目标 LLM 最近生成的 n 个词组的精确 n-gram 匹配。CREST 的关键理念是在数据存储中只存储最小和最常见的 n 个词组的子集，希望以较少的存储空间实现相当的性能。我们发现，存储 n-grams 的子集既能减少存储空间，又能提高性能。在 HumanEval 和 MT Benchbenchmarks 上，CREST 用 10.6-13.5 倍的存储空间达到了 REST 的可接受标记长度，用相同的存储空间实现了比 REST 高 16.5-17.1% 的可接受长度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Databases

自引率

0.00%

发文量

期刊最新文献

Development of Data Evaluation Benchmark for Data Wrangling Recommendation System Messy Code Makes Managing ML Pipelines Difficult? Just Let LLMs Rewrite the Code! Fast and Adaptive Bulk Loading of Multidimensional Points Matrix Profile for Anomaly Detection on Multidimensional Time Series Extending predictive process monitoring for collaborative processes