Yusheng Zheng, Yiwei Yang, Maolin Chen, Andrew Quinn
{"title":"KEN: Kernel Extensions using Natural Language","authors":"Yusheng Zheng, Yiwei Yang, Maolin Chen, Andrew Quinn","doi":"arxiv-2312.05531","DOIUrl":null,"url":null,"abstract":"The ability to modify and extend an operating system is an important feature\nfor improving a system's security, reliability, and performance. The extended\nBerkeley Packet Filters (eBPF) ecosystem has emerged as the standard mechanism\nfor extending the Linux kernel and has recently been ported to Windows. eBPF\nprograms inject new logic into the kernel that the system will execute before\nor after existing logic. While the eBPF ecosystem provides a flexible mechanism\nfor kernel extension, it is difficult for developers to write eBPF programs\ntoday. An eBPF developer must have deep knowledge of the internals of the\noperating system to determine where to place logic and cope with programming\nlimitations on the control flow and data accesses of their eBPF program\nenforced by the eBPF verifier. This paper presents KEN, an alternative\nframework that alleviates the difficulty of writing an eBPF program by allowing\nKernel Extensions to be written in Natural language. KEN uses recent advances\nin large language models (LLMs) to synthesize an eBPF program given a user's\nEnglish language prompt. To ensure that LLM's output is semantically equivalent\nto the user's prompt, KEN employs a combination of LLM-empowered program\ncomprehension, symbolic execution, and a series of feedback loops. KEN's key\nnovelty is the combination of these techniques. In particular, the system uses\nsymbolic execution in a novel structure that allows it to combine the results\nof program synthesis and program comprehension and build on the recent success\nthat LLMs have shown for each of these tasks individually. To evaluate KEN, we\ndeveloped a new corpus of natural language prompts for eBPF programs. We show\nthat KEN produces correct eBPF programs on 80% which is an improvement of a\nfactor of 2.67 compared to an LLM-empowered program synthesis baseline.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"81 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Operating Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2312.05531","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The ability to modify and extend an operating system is an important feature
for improving a system's security, reliability, and performance. The extended
Berkeley Packet Filters (eBPF) ecosystem has emerged as the standard mechanism
for extending the Linux kernel and has recently been ported to Windows. eBPF
programs inject new logic into the kernel that the system will execute before
or after existing logic. While the eBPF ecosystem provides a flexible mechanism
for kernel extension, it is difficult for developers to write eBPF programs
today. An eBPF developer must have deep knowledge of the internals of the
operating system to determine where to place logic and cope with programming
limitations on the control flow and data accesses of their eBPF program
enforced by the eBPF verifier. This paper presents KEN, an alternative
framework that alleviates the difficulty of writing an eBPF program by allowing
Kernel Extensions to be written in Natural language. KEN uses recent advances
in large language models (LLMs) to synthesize an eBPF program given a user's
English language prompt. To ensure that LLM's output is semantically equivalent
to the user's prompt, KEN employs a combination of LLM-empowered program
comprehension, symbolic execution, and a series of feedback loops. KEN's key
novelty is the combination of these techniques. In particular, the system uses
symbolic execution in a novel structure that allows it to combine the results
of program synthesis and program comprehension and build on the recent success
that LLMs have shown for each of these tasks individually. To evaluate KEN, we
developed a new corpus of natural language prompts for eBPF programs. We show
that KEN produces correct eBPF programs on 80% which is an improvement of a
factor of 2.67 compared to an LLM-empowered program synthesis baseline.