Proceedings of the 19th Workshop on Hot Topics in Operating Systems最新文献

英文中文

System Virtualization for Neural Processing Units 神经处理单元的系统虚拟化

Proceedings of the 19th Workshop on Hot Topics in Operating Systems

Pub Date : 2023-06-22 DOI: 10.1145/3593856.3595912

Yu Xue, Yiqi Liu, Jian Huang

Modern cloud platforms have been employing hardware accelerators such as neural processing units (NPUs) to meet the increasing demand for computing resources for AI-based application services. However, due to the lack of system virtualization support, the current way of using NPUs in cloud platforms suffers from either low resource utilization or poor isolation between multi-tenant application services. In this paper, we investigate the system virtualization techniques for NPUs across the entire software and hardware stack, and present our NPU virtualization solution named NeuCloud. We propose a flexible NPU abstraction named vNPU that allows fine-grained NPU virtualization and resource management. We leverage this abstraction and design the vNPU allocation, mapping, and scheduling policies to maximize the resource utilization, while achieving both performance and security isolation for vNPU instances at runtime.

现代云平台一直在采用神经处理单元(npu)等硬件加速器，以满足基于人工智能的应用服务对计算资源日益增长的需求。但是，由于缺乏系统虚拟化支持，目前在云平台上使用npu的方式存在资源利用率低或多租户应用服务之间隔离性差的问题。在本文中，我们研究了跨整个软件和硬件堆栈的NPU系统虚拟化技术，并提出了我们的NPU虚拟化解决方案NeuCloud。我们提出了一种灵活的NPU抽象，称为vNPU，它允许细粒度的NPU虚拟化和资源管理。我们利用这种抽象并设计vNPU分配、映射和调度策略，以最大限度地提高资源利用率，同时在运行时实现vNPU实例的性能和安全隔离。

引用次数: 0

Prefetching Using Principles of Hippocampal-Neocortical Interaction 利用海马体-新皮层相互作用原理进行预取

Proceedings of the 19th Workshop on Hot Topics in Operating Systems

Pub Date : 2023-06-22 DOI: 10.1145/3593856.3595901

Michael Wu, Ketaki Joshi, Andrew Sheinberg, Guilherme Cox, Anurag Khandelwal, Raghavendra Pradyumna Pothukuchi, A. Bhattacharjee

Memory prefetching improves performance across many systems layers. However, achieving high prefetch accuracy with low overhead is challenging, as memory hierarchies and application memory access patterns become more complicated. Furthermore, a prefetcher's ability to adapt to new access patterns as they emerge is becoming more crucial than ever. Recent work has demonstrated the use of deep learning techniques to improve prefetching accuracy, albeit with impractical compute and storage overheads. This paper suggests taking inspiration from the learning mechanisms and memory architecture of the human brain---specifically, the hippocampus and neocortex---to build resource-efficient, accurate, and adaptable prefetchers.

内存预取可以提高跨许多系统层的性能。然而，随着内存层次结构和应用程序内存访问模式变得更加复杂，以低开销实现高预取精度是具有挑战性的。此外，预取器适应新访问模式的能力变得比以往任何时候都更加重要。最近的工作已经证明了使用深度学习技术来提高预取的准确性，尽管有不切实际的计算和存储开销。这篇论文建议从人类大脑的学习机制和记忆结构中获得灵感——特别是海马体和新皮层——来构建资源高效、准确和适应性强的预取器。

引用次数: 0

Towards a Manageable Intra-Host Network 迈向可管理的主机内部网络

Proceedings of the 19th Workshop on Hot Topics in Operating Systems

Pub Date : 2023-06-22 DOI: 10.1145/3593856.3595890

Xinhao Kong, Jiaqi Lou, Wei Bai, Nan Sung Kim, Danyang Zhuo

Intra-host networks, including heterogeneous devices and interconnect fabrics, have become increasingly complex and crucial. However, intra-host networks today do not provide sufficient manageability. This prevents data center operators from running a reliable and efficient end-to-end network, especially for multi-tenant clouds. In this paper, we analyze the main manageability deficiencies of intra-host networks and argue that a systematic solution should be implemented to bridge this function gap. We propose two key building blocks for a manageable intra-host network: a fine-grained monitoring system and a holistic resource manager. We discuss the research questions associated with realizing these two building blocks.

主机内网络，包括异构设备和互连结构，已经变得越来越复杂和重要。然而，目前的主机内网络不能提供足够的可管理性。这使得数据中心运营商无法运行可靠、高效的端到端网络，特别是对于多租户云。在本文中，我们分析了主机内网络的主要可管理性缺陷，并认为应该实施一个系统的解决方案来弥补这一功能差距。我们提出了可管理的主机内网络的两个关键构建块:细粒度监控系统和整体资源管理器。我们将讨论与实现这两个构建块相关的研究问题。

引用次数: 0

Metal: An Open Architecture for Developing Processor Features Metal:开发处理器特性的开放体系结构

Proceedings of the 19th Workshop on Hot Topics in Operating Systems

Pub Date : 2023-06-22 DOI: 10.1145/3593856.3595915

Siyao Zhao, A. Mashtizadeh

In recent years, an increasing number of hardware devices started providing programming interfaces to developers such as smart NICs. Processor vendors use microcode to extend processors' features such as Intel SGX and VT-x. This enables processor architects to quickly evolve processor designs and features. However, modern processors still lack general programmability as microcode is inaccessible to system developers. Developers still cannot define custom processor features. We argue that processors should expose this capability to developers, which enables new operating system and application designs. We propose Metal, a novel open architecture that enables system developers to define custom instructions with microcode level overhead. We implement a prototype of Metal on a 5-stage pipelined RISC processor with minimal additional hardware resources. We demonstrate Metal's capability by building a variety of architectural extensions such as user defined privilege levels. We also discuss other potential applications and future directions for Metal.

近年来，越来越多的硬件设备开始为开发人员提供编程接口，例如智能网卡。处理器供应商使用微码来扩展处理器的功能，如英特尔SGX和VT-x。这使得处理器架构师能够快速地改进处理器的设计和特性。然而，现代处理器仍然缺乏通用的可编程性，因为系统开发人员无法访问微码。开发人员仍然不能定义自定义处理器特性。我们认为处理器应该向开发人员公开这种能力，从而实现新的操作系统和应用程序设计。我们提出Metal，这是一种新颖的开放架构，使系统开发人员能够定义具有微码级开销的自定义指令。我们在5级流水线RISC处理器上实现了Metal的原型，并且使用了最少的额外硬件资源。我们通过构建各种体系结构扩展(如用户定义的特权级别)来演示Metal的功能。我们还讨论了金属的其他潜在应用和未来方向。

引用次数: 0

Towards Increased Datacenter Efficiency with Soft Memory 利用软内存提高数据中心效率

Proceedings of the 19th Workshop on Hot Topics in Operating Systems

Pub Date : 2023-06-22 DOI: 10.1145/3593856.3595902

Megan Frisella, Shirley Loayza Sanchez, Malte Schwarzkopf

Memory is the bottleneck resource in today's datacenters because it is inflexible: low-priority processes are routinely killed to free up resources during memory pressure. This wastes CPU cycles upon re-running killed jobs and incentivizes datacenter operators to run at low memory utilization for safety. This paper introduces soft memory, a software-level abstraction on top of standard primary storage that, under memory pressure, makes memory revocable for re-allocation elsewhere. We prototype soft memory with the Redis key-value store, and find that it has low overhead.

内存是当今数据中心的瓶颈资源，因为它不灵活:在内存压力下，通常会杀死低优先级的进程以释放资源。这会在重新运行已终止的作业时浪费CPU周期，并激励数据中心操作员在低内存利用率下运行以确保安全。本文介绍了软内存，它是在标准主存储之上的一种软件级抽象，在内存压力下，它使内存可以在其他地方重新分配。我们用Redis键值存储对软内存进行了原型化，发现它的开销很低。

引用次数: 1

FBMM: Using the VFS for Extensibility in Kernel Memory Management FBMM:在内核内存管理中使用VFS实现可扩展性

Proceedings of the 19th Workshop on Hot Topics in Operating Systems

Pub Date : 2023-06-22 DOI: 10.1145/3593856.3595908

B. Tabatabai, Mark Mansi, M. Swift

Modern memory hierarchies are increasingly complex, with more memory types and richer topologies. Unfortunately kernel memory managers lack the extensibility that many other parts of the kernel use to support diversity. This makes it difficult to add and deploy support for new memory configurations, such as tiered memory: engineers must navigate and modify the monolithic memory management code to add support, and custom kernels are needed to deploy such support until it is upstreamed. We take inspiration from filesystems and note that VFS, the extensible interface for filesystems, supports a huge variety of filesystems for different media and different use cases, and importantly, has interfaces for memory management operations such as controlling virtual-to-physical mapping and handling page faults. We propose writing memory management systems as filesystems using VFS, bringing extensibility to kernel memory management. We call this idea File-Based Memory Management (FBMM). Using this approach, many recent memory management extensions, e.g., tiering support, can be written without modifying existing memory management code. We prototype FBMM in Linux to show that the overhead of extensibility is low (within 1.6%) and that it enables useful extensions.

现代内存层次结构越来越复杂，具有更多的内存类型和更丰富的拓扑结构。不幸的是，内核内存管理器缺乏内核的许多其他部分用来支持多样性的可扩展性。这使得添加和部署对新内存配置的支持变得困难，例如分层内存:工程师必须导航和修改单片内存管理代码来添加支持，并且需要自定义内核来部署这种支持，直到它被上行。我们从文件系统中获得灵感，并注意到VFS(文件系统的可扩展接口)支持用于不同介质和不同用例的各种文件系统，而且重要的是，它具有用于内存管理操作的接口，例如控制虚拟到物理映射和处理页面错误。我们建议使用VFS将内存管理系统编写为文件系统，从而为内核内存管理带来可扩展性。我们称之为基于文件的内存管理(FBMM)。使用这种方法，可以在不修改现有内存管理代码的情况下编写许多最新的内存管理扩展，例如，分层支持。我们在Linux中对FBMM进行了原型化，以表明可扩展性的开销很低(在1.6%以内)，并且它支持有用的扩展。

{"title":"FBMM: Using the VFS for Extensibility in Kernel Memory Management","authors":"B. Tabatabai, Mark Mansi, M. Swift","doi":"10.1145/3593856.3595908","DOIUrl":"https://doi.org/10.1145/3593856.3595908","url":null,"abstract":"Modern memory hierarchies are increasingly complex, with more memory types and richer topologies. Unfortunately kernel memory managers lack the extensibility that many other parts of the kernel use to support diversity. This makes it difficult to add and deploy support for new memory configurations, such as tiered memory: engineers must navigate and modify the monolithic memory management code to add support, and custom kernels are needed to deploy such support until it is upstreamed. We take inspiration from filesystems and note that VFS, the extensible interface for filesystems, supports a huge variety of filesystems for different media and different use cases, and importantly, has interfaces for memory management operations such as controlling virtual-to-physical mapping and handling page faults. We propose writing memory management systems as filesystems using VFS, bringing extensibility to kernel memory management. We call this idea File-Based Memory Management (FBMM). Using this approach, many recent memory management extensions, e.g., tiering support, can be written without modifying existing memory management code. We prototype FBMM in Linux to show that the overhead of extensibility is low (within 1.6%) and that it enables useful extensions.","PeriodicalId":330470,"journal":{"name":"Proceedings of the 19th Workshop on Hot Topics in Operating Systems","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114267081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Skadi: Building a Distributed Runtime for Data Systems in Disaggregated Data Centers 为分解数据中心中的数据系统构建分布式运行时

Proceedings of the 19th Workshop on Hot Topics in Operating Systems

Pub Date : 2023-06-22 DOI: 10.1145/3593856.3595897

Cunchen Hu, Chenxi Wang, Sa Wang, Ninghui Sun, Yungang Bao, Jieru Zhao, Sanidhya Kashyap, Pengfei Zuo, Xusheng Chen, Liangliang Xu, Qin Zhang, Hao Feng, Yizhou Shan

Data-intensive systems are the backbone of today's computing and are responsible for shaping data centers. Over the years, cloud providers have relied on three principles to maintain cost-effective data systems: use disaggregation to decouple scaling, use domain-specific computing to battle waning laws, and use serverless to lower costs. Although they work well individually, they fail to work in harmony: an issue amplified by emerging data system workloads. In this paper, we envision a distributed runtime to mitigate current shortcomings. The distributed runtime has a tiered access layer exposing declarative APIs, underpinned by a stateful serverless runtime with a distributed task execution model. It will be the narrow waist between data systems and hardware. Users are oblivious to data location, concurrency, disaggregation style, or even the hardware to do the computing. The underlying stateful serverless runtime transparently evolves with novel data-center architectures, such as disaggregation and tightly-coupled clusters. We prototype Skadi to showcase that the distributed runtime is practical.

数据密集型系统是当今计算的支柱，负责塑造数据中心。多年来，云提供商一直依赖于三个原则来维护具有成本效益的数据系统:使用分解来解耦扩展，使用特定领域的计算来对抗日益减弱的法律，以及使用无服务器来降低成本。尽管它们各自都能很好地工作，但它们无法和谐地工作:新兴的数据系统工作量放大了这个问题。在本文中，我们设想了一个分布式运行时来减轻当前的缺点。分布式运行时具有一个分层访问层，公开声明性api，并由具有分布式任务执行模型的有状态无服务器运行时作为基础。它将是数据系统和硬件之间的窄腰。用户不关心数据位置、并发性、分解风格，甚至不关心执行计算的硬件。底层有状态无服务器运行时随着新的数据中心体系结构(如分解和紧密耦合集群)透明地发展。我们以Skadi为原型来展示分布式运行时是实用的。

{"title":"Skadi: Building a Distributed Runtime for Data Systems in Disaggregated Data Centers","authors":"Cunchen Hu, Chenxi Wang, Sa Wang, Ninghui Sun, Yungang Bao, Jieru Zhao, Sanidhya Kashyap, Pengfei Zuo, Xusheng Chen, Liangliang Xu, Qin Zhang, Hao Feng, Yizhou Shan","doi":"10.1145/3593856.3595897","DOIUrl":"https://doi.org/10.1145/3593856.3595897","url":null,"abstract":"Data-intensive systems are the backbone of today's computing and are responsible for shaping data centers. Over the years, cloud providers have relied on three principles to maintain cost-effective data systems: use disaggregation to decouple scaling, use domain-specific computing to battle waning laws, and use serverless to lower costs. Although they work well individually, they fail to work in harmony: an issue amplified by emerging data system workloads. In this paper, we envision a distributed runtime to mitigate current shortcomings. The distributed runtime has a tiered access layer exposing declarative APIs, underpinned by a stateful serverless runtime with a distributed task execution model. It will be the narrow waist between data systems and hardware. Users are oblivious to data location, concurrency, disaggregation style, or even the hardware to do the computing. The underlying stateful serverless runtime transparently evolves with novel data-center architectures, such as disaggregation and tightly-coupled clusters. We prototype Skadi to showcase that the distributed runtime is practical.","PeriodicalId":330470,"journal":{"name":"Proceedings of the 19th Workshop on Hot Topics in Operating Systems","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131550581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Beyond isolation: OS verification as a foundation for correct applications 超越隔离:操作系统验证作为正确应用程序的基础

Proceedings of the 19th Workshop on Hot Topics in Operating Systems

Pub Date : 2023-06-22 DOI: 10.1145/3593856.3595899

M. Brun, Reto Achermann, Tej Chajed, Jon Howell, Gerd Zellweger, Andrea Lattuada

Verified systems software has generally had to assume the correctness of the operating system and its provided services (like networking and the file system). Even though there exist verified operating systems and file systems, the specifications for these components do not compose with applications to produce a fully verified high-performance software stack. In this position paper, we lay out our vision for what it would look like to have a verified OS with verified applications, all with good multi-core performance. We've explored a part of the verification by proving a page table correct already, but the larger goal is to lay out a vision for an ambitious project that supports an application verified from its high-level specification down to the hardware.

经过验证的系统软件通常必须假定操作系统及其提供的服务(如网络和文件系统)是正确的。即使存在经过验证的操作系统和文件系统，这些组件的规范也不能与应用程序组合在一起，以产生经过完全验证的高性能软件堆栈。在这份意见书中，我们展示了我们的愿景，即拥有一个经过验证的操作系统和经过验证的应用程序，都具有良好的多核性能。我们已经通过证明页表是正确的来探索验证的一部分，但更大的目标是为一个雄心勃勃的项目规划一个远景，该项目支持从高级规范到硬件验证的应用程序。

引用次数: 2

The Case for Performance Interfaces for Hardware Accelerators 硬件加速器的性能接口案例

Proceedings of the 19th Workshop on Hot Topics in Operating Systems

Pub Date : 2023-06-22 DOI: 10.1145/3593856.3595904

Rishabh R. Iyer, Jiacheng Ma, K. Argyraki, George Candea, S. Ratnasamy

While systems designers are increasingly turning to hardware accelerators for performance gains, realizing these gains is painstaking and error-prone. It can take several person-months to determine if a given accelerator is a good fit for a given piece of code, and accelerators that cost millions of dollars to build can slow down the very systems they were designed to accelerate. We argue that hardware accelerators must come with performance interfaces---interfaces that provide usable information about the accelerator's performance behavior just like semantic interfaces do for functionality---to facilitate their correct use. Since accelerators do not provide new functionality and are only useful if they improve system performance, performance interfaces are as integral to their correct use as semantic interfaces.

虽然系统设计人员越来越多地转向硬件加速器以获得性能提升，但实现这些提升是一项艰苦且容易出错的工作。确定给定的加速器是否适合给定的代码段可能需要几个人-几个月的时间，而花费数百万美元构建的加速器可能会减慢它们设计用来加速的系统的速度。我们认为硬件加速器必须附带性能接口——这些接口提供关于加速器性能行为的可用信息，就像语义接口为功能提供的信息一样——以促进它们的正确使用。由于加速器不提供新功能，而且只有在提高系统性能时才有用，因此性能接口与语义接口一样，对它们的正确使用是不可或缺的。

引用次数: 2

Kernel extension verification is untenable 内核扩展验证站不住脚

Proceedings of the 19th Workshop on Hot Topics in Operating Systems

Pub Date : 2023-06-22 DOI: 10.1145/3593856.3595892

Jinghao Jia, R. Sahu, Adam Oswald, Daniel W. Williams, Michael V. Le, Tianyi Xu

The emergence of verified eBPF bytecode is ushering in a new era of safe kernel extensions. In this paper, we argue that eBPF's verifier---the source of its safety guarantees---has become a liability. In addition to the well-known bugs and vulnerabilities stemming from the complexity and ad hoc nature of the in-kernel verifier, we highlight a concerning trend in which escape hatches to unsafe kernel functions (in the form of helper functions) are being introduced to bypass verifier-imposed limitations on expressiveness, unfortunately also bypassing its safety guarantees. We propose safe kernel extension frameworks using a balance of not just static but also lightweight runtime techniques. We describe a design centered around kernel extensions in safe Rust that will eliminate the need of the in-kernel verifier, improve expressiveness, allow for reduced escape hatches, and ultimately improve the safety of kernel extensions.

经过验证的eBPF字节码的出现开启了安全内核扩展的新时代。在本文中，我们认为eBPF的验证者——其安全保证的来源——已经成为一种责任。除了众所周知的源于内核内验证器的复杂性和特殊性质的错误和漏洞之外，我们还强调了一个令人担忧的趋势，即引入了不安全内核函数(以辅助函数的形式)的逃逸口，以绕过验证器对表达性施加的限制，不幸的是也绕过了它的安全保证。我们提出安全的内核扩展框架，不仅使用静态技术，而且使用轻量级运行时技术。我们描述了一种在安全Rust中以内核扩展为中心的设计，它将消除对内核内验证器的需求，改善表达性，允许减少逃生口，并最终提高内核扩展的安全性。

引用次数: 6

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 19th Workshop on Hot Topics in Operating Systems

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀