KEN: Kernel Extensions using Natural Language

arXiv - CS - Operating Systems Pub Date : 2023-12-09 DOI:arxiv-2312.05531

Yusheng Zheng, Yiwei Yang, Maolin Chen, Andrew Quinn

{"title":"KEN: Kernel Extensions using Natural Language","authors":"Yusheng Zheng, Yiwei Yang, Maolin Chen, Andrew Quinn","doi":"arxiv-2312.05531","DOIUrl":null,"url":null,"abstract":"The ability to modify and extend an operating system is an important feature\nfor improving a system's security, reliability, and performance. The extended\nBerkeley Packet Filters (eBPF) ecosystem has emerged as the standard mechanism\nfor extending the Linux kernel and has recently been ported to Windows. eBPF\nprograms inject new logic into the kernel that the system will execute before\nor after existing logic. While the eBPF ecosystem provides a flexible mechanism\nfor kernel extension, it is difficult for developers to write eBPF programs\ntoday. An eBPF developer must have deep knowledge of the internals of the\noperating system to determine where to place logic and cope with programming\nlimitations on the control flow and data accesses of their eBPF program\nenforced by the eBPF verifier. This paper presents KEN, an alternative\nframework that alleviates the difficulty of writing an eBPF program by allowing\nKernel Extensions to be written in Natural language. KEN uses recent advances\nin large language models (LLMs) to synthesize an eBPF program given a user's\nEnglish language prompt. To ensure that LLM's output is semantically equivalent\nto the user's prompt, KEN employs a combination of LLM-empowered program\ncomprehension, symbolic execution, and a series of feedback loops. KEN's key\nnovelty is the combination of these techniques. In particular, the system uses\nsymbolic execution in a novel structure that allows it to combine the results\nof program synthesis and program comprehension and build on the recent success\nthat LLMs have shown for each of these tasks individually. To evaluate KEN, we\ndeveloped a new corpus of natural language prompts for eBPF programs. We show\nthat KEN produces correct eBPF programs on 80% which is an improvement of a\nfactor of 2.67 compared to an LLM-empowered program synthesis baseline.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"81 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Operating Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2312.05531","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The ability to modify and extend an operating system is an important feature for improving a system's security, reliability, and performance. The extended Berkeley Packet Filters (eBPF) ecosystem has emerged as the standard mechanism for extending the Linux kernel and has recently been ported to Windows. eBPF programs inject new logic into the kernel that the system will execute before or after existing logic. While the eBPF ecosystem provides a flexible mechanism for kernel extension, it is difficult for developers to write eBPF programs today. An eBPF developer must have deep knowledge of the internals of the operating system to determine where to place logic and cope with programming limitations on the control flow and data accesses of their eBPF program enforced by the eBPF verifier. This paper presents KEN, an alternative framework that alleviates the difficulty of writing an eBPF program by allowing Kernel Extensions to be written in Natural language. KEN uses recent advances in large language models (LLMs) to synthesize an eBPF program given a user's English language prompt. To ensure that LLM's output is semantically equivalent to the user's prompt, KEN employs a combination of LLM-empowered program comprehension, symbolic execution, and a series of feedback loops. KEN's key novelty is the combination of these techniques. In particular, the system uses symbolic execution in a novel structure that allows it to combine the results of program synthesis and program comprehension and build on the recent success that LLMs have shown for each of these tasks individually. To evaluate KEN, we developed a new corpus of natural language prompts for eBPF programs. We show that KEN produces correct eBPF programs on 80% which is an improvement of a factor of 2.67 compared to an LLM-empowered program synthesis baseline.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

KEN：使用自然语言的内核扩展

修改和扩展操作系统的能力是提高系统安全性、可靠性和性能的一项重要功能。扩展的伯克利数据包过滤器（eBPF）生态系统已成为扩展 Linux 内核的标准机制，最近还被移植到了 Windows 上。eBPF 程序将新的逻辑注入内核，系统将在现有逻辑之前或之后执行这些逻辑。虽然 eBPF 生态系统为内核扩展提供了一种灵活的机制，但如今开发人员很难编写 eBPF 程序。eBPF 开发者必须对操作系统的内部结构有深入的了解，才能确定在何处放置逻辑，并应对 eBPF 验证器对其 eBPF 程序的控制流和数据访问的编程限制。本文介绍的 KEN 是一个替代框架，它允许使用自然语言编写内核扩展，从而减轻了编写 eBPF 程序的难度。KEN 利用大语言模型（LLM）的最新进展，根据用户的英语提示合成 eBPF 程序。为确保 LLM 的输出在语义上等同于用户的提示，KEN 结合使用了 LLM 驱动的程序理解、符号执行和一系列反馈回路。KEN 的关键之处在于这些技术的结合。特别是，该系统在一种新颖的结构中使用了符号执行，从而将程序合成和程序理解的结果结合起来，并以 LLM 最近在这两项任务中分别取得的成功为基础。为了评估 KEN，我们为 eBPF 程序开发了一个新的自然语言提示语料库。结果表明，KEN 生成的 eBPF 程序正确率达到 80%，与 LLM 支持的程序合成基线相比，提高了 2.67 个系数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Operating Systems

自引率

0.00%

发文量

期刊最新文献

Analysis of Synchronization Mechanisms in Operating Systems Skip TLB flushes for reused pages within mmap's eBPF-mm: Userspace-guided memory management in Linux with eBPF BULKHEAD: Secure, Scalable, and Efficient Kernel Compartmentalization with PKS Rethinking Programmed I/O for Fast Devices, Cheap Cores, and Coherent Interconnects