arXiv - CS - Operating Systems最新文献_第8页

LLM as OS (llmao), Agents as Apps: Envisioning AIOS, Agents and the AIOS-Agent Ecosystem LLM as OS (llmao)，Agent as Apps：设想 AIOS、代理和 AIOS-Agent 生态系统

arXiv - CS - Operating Systems

Pub Date : 2023-12-06 DOI: arxiv-2312.03815

Yingqiang Ge, Yujie Ren, Wenyue Hua, Shuyuan Xu, Juntao Tan, Yongfeng Zhang

This paper envisions a revolutionary AIOS-Agent ecosystem, where LargeLanguage Model (LLM) serves as the (Artificial) Intelligent Operating System(IOS, or AIOS)--an operating system ``with soul''. Upon this foundation, adiverse range of LLM-based AI Agent Applications (Agents, or AAPs) aredeveloped, enriching the AIOS-Agent ecosystem and signaling a paradigm shiftfrom the traditional OS-APP ecosystem. We envision that LLM's impact will notbe limited to the AI application level, instead, it will in turn revolutionizethe design and implementation of computer system, architecture, software, andprogramming language, featured by several main concepts: LLM as OS(system-level), Agents as Applications (application-level), Natural Language asProgramming Interface (user-level), and Tools as Devices/Libraries(hardware/middleware-level).

本文设想了一个革命性的AIOS-Agent生态系统，其中大语言模型（LLM）作为（人工）智能操作系统（IOS，或AIOS）--一个 "有灵魂 "的操作系统。在此基础上，各种基于 LLM 的人工智能代理应用程序（Agent，或 AAPs）被开发出来，丰富了 AIOS-Agent 生态系统，标志着传统操作系统-App 生态系统的范式转变。我们认为，LLM 的影响将不仅限于人工智能应用层面，相反，它还将彻底改变计算机系统、体系结构、软件和编程语言的设计与实现：LLM 作为操作系统（系统级）、作为应用程序的代理（应用级）、作为编程界面的自然语言（用户级），以及作为设备/库的工具（硬件/中间件级）。

引用次数: 0

Robust Resource Partitioning Approach for ARINC 653 RTOS arinc653实时操作系统的鲁棒资源分区方法

arXiv - CS - Operating Systems

Pub Date : 2023-12-03 DOI: arxiv-2312.01436

Vitaly Cheptsov, Alexey Khoroshilov

Modern airborne operating systems implement the concept of robust time andresource partitioning imposed by the standards for aerospace andairborne-embedded software systems, such as ARINC 653. While these standards doprovide a considerable amount of design choices in regards to resourcepartitioning on the architectural and API levels, such as isolated memoryspaces between the application partitions, predefined resource configuration,and unidirectional ports with limited queue and message sizes forinter-partition communication, they do not specify how an operating systemshould implement them in software. Furthermore, they often tend to set theminimal level of the required guarantees, for example, in terms of memorypermissions, and disregard the hardware state of the art, which presently canprovide considerably stronger guarantees at no extra cost. In the paper wepresent an architecture of robust resource partitioning for ARINC 653 real-timeoperating systems based on completely static MMU configuration. Thearchitecture was implemented on different types of airborne hardware, includingplatforms with TLB-based and page table-based MMU. Key benefits of the proposedapproach include minimised run-time overhead and simpler verification of thememory subsystem.

现代机载操作系统实现了航空航天和机载嵌入式软件系统(如arinc653)标准所强加的鲁棒时间和资源划分概念。虽然这些标准确实在体系结构和API级别上提供了相当多的关于资源分区的设计选择，例如应用程序分区之间的隔离内存空间、预定义的资源配置，以及用于分区间通信的具有有限队列和消息大小的单向端口，但它们没有指定操作系统应该如何在软件中实现它们。此外，他们往往倾向于设置所需保证的最低水平，例如，在内存权限方面，而忽略硬件的技术状态，目前可以提供相当强大的保证，而不需要额外的成本。本文提出了一种基于完全静态MMU配置的arinc653实时操作系统鲁棒资源分区体系结构。该架构在不同类型的机载硬件上实现，包括基于tlb和基于页表的MMU平台。所提出的方法的主要优点包括最小化运行时开销和更简单的内存子系统验证。

{"title":"Robust Resource Partitioning Approach for ARINC 653 RTOS","authors":"Vitaly Cheptsov, Alexey Khoroshilov","doi":"arxiv-2312.01436","DOIUrl":"https://doi.org/arxiv-2312.01436","url":null,"abstract":"Modern airborne operating systems implement the concept of robust time and\u0000resource partitioning imposed by the standards for aerospace and\u0000airborne-embedded software systems, such as ARINC 653. While these standards do\u0000provide a considerable amount of design choices in regards to resource\u0000partitioning on the architectural and API levels, such as isolated memory\u0000spaces between the application partitions, predefined resource configuration,\u0000and unidirectional ports with limited queue and message sizes for\u0000inter-partition communication, they do not specify how an operating system\u0000should implement them in software. Furthermore, they often tend to set the\u0000minimal level of the required guarantees, for example, in terms of memory\u0000permissions, and disregard the hardware state of the art, which presently can\u0000provide considerably stronger guarantees at no extra cost. In the paper we\u0000present an architecture of robust resource partitioning for ARINC 653 real-time\u0000operating systems based on completely static MMU configuration. The\u0000architecture was implemented on different types of airborne hardware, including\u0000platforms with TLB-based and page table-based MMU. Key benefits of the proposed\u0000approach include minimised run-time overhead and simpler verification of the\u0000memory subsystem.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"88 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138523002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MaxMem: Colocation and Performance for Big Data Applications on Tiered Main Memory Servers MaxMem:分级主存服务器上大数据应用的托管和性能

arXiv - CS - Operating Systems

Pub Date : 2023-12-01 DOI: arxiv-2312.00647

Amanda RaybuckThe University of Texas at Austin, Wei ZhangMicrosoft, Kayvan MansoorshahiThe University of Texas at Austin, Aditya K. KamathUniversity of Washington, Mattan ErezThe University of Texas at Austin, Simon PeterUniversity of Washington

We present MaxMem, a tiered main memory management system that aims tomaximize Big Data application colocation and performance. MaxMem uses anapplication-agnostic and lightweight memory occupancy control mechanism basedon fast memory miss ratios to provide application QoS under increasingcolocation. By relying on memory access sampling and binning to quicklyidentify per-process memory heat gradients, MaxMem maximizes performance formany applications sharing tiered main memory simultaneously. MaxMem is designedas a user-space memory manager to be easily modifiable and extensible, withoutcomplex kernel code development. On a system with tiered main memory consistingof DRAM and Intel Optane persistent memory modules, our evaluation confirmsthat MaxMem provides 11% and 38% better throughput and up to 80% and an orderof magnitude lower 99th percentile latency than HeMem and Linux AutoNUMA,respectively, with a Big Data key-value store in dynamic colocation scenarios.

我们介绍了MaxMem，一个分层主存管理系统，旨在最大限度地提高大数据应用的托管和性能。MaxMem使用基于快速内存缺失率的应用程序无关的轻量级内存占用控制机制，在不断增加的主机配置下提供应用程序QoS。通过依赖内存访问采样和分组来快速识别每个进程的内存热梯度，MaxMem最大限度地提高了同时共享分层主存的许多应用程序的性能。MaxMem被设计为一个用户空间内存管理器，可以很容易地修改和扩展，不需要复杂的内核代码开发。在一个由DRAM和英特尔Optane持久内存模块组成的分层主内存系统上，我们的评估证实，MaxMem在动态托管场景下，与HeMem和Linux AutoNUMA相比，MaxMem的吞吐量分别提高了11%和38%，延迟率高达80%，延迟率降低了99%。

{"title":"MaxMem: Colocation and Performance for Big Data Applications on Tiered Main Memory Servers","authors":"Amanda RaybuckThe University of Texas at Austin, Wei ZhangMicrosoft, Kayvan MansoorshahiThe University of Texas at Austin, Aditya K. KamathUniversity of Washington, Mattan ErezThe University of Texas at Austin, Simon PeterUniversity of Washington","doi":"arxiv-2312.00647","DOIUrl":"https://doi.org/arxiv-2312.00647","url":null,"abstract":"We present MaxMem, a tiered main memory management system that aims to\u0000maximize Big Data application colocation and performance. MaxMem uses an\u0000application-agnostic and lightweight memory occupancy control mechanism based\u0000on fast memory miss ratios to provide application QoS under increasing\u0000colocation. By relying on memory access sampling and binning to quickly\u0000identify per-process memory heat gradients, MaxMem maximizes performance for\u0000many applications sharing tiered main memory simultaneously. MaxMem is designed\u0000as a user-space memory manager to be easily modifiable and extensible, without\u0000complex kernel code development. On a system with tiered main memory consisting\u0000of DRAM and Intel Optane persistent memory modules, our evaluation confirms\u0000that MaxMem provides 11% and 38% better throughput and up to 80% and an order\u0000of magnitude lower 99th percentile latency than HeMem and Linux AutoNUMA,\u0000respectively, with a Big Data key-value store in dynamic colocation scenarios.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138522989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cascade: A Platform for Delay-Sensitive Edge Intelligence 级联:延迟敏感边缘智能平台

arXiv - CS - Operating Systems

Pub Date : 2023-11-29 DOI: arxiv-2311.17329

Weijia Song, Thiago Garrett, Yuting Yang, Mingzhao Liu, Edward Tremel, Lorenzo Rosa, Andrea Merlina, Roman Vitenberg, Ken Birman

Interactive intelligent computing applications are increasingly prevalent,creating a need for AI/ML platforms optimized to reduce per-event latency whilemaintaining high throughput and efficient resource management. Yet manyintelligent applications run on AI/ML platforms that optimize for highthroughput even at the cost of high tail-latency. Cascade is a new AI/MLhosting platform intended to untangle this puzzle. Innovations include alegacy-friendly storage layer that moves data with minimal copying and a "fastpath" that collocates data and computation to maximize responsiveness. Ourevaluation shows that Cascade reduces latency by orders of magnitude with noloss of throughput.

交互式智能计算应用程序越来越普遍，因此需要对AI/ML平台进行优化，以减少每个事件的延迟，同时保持高吞吐量和高效的资源管理。然而，许多智能应用程序运行在AI/ML平台上，即使以高尾延迟为代价，也会为高吞吐量进行优化。Cascade是一个新的AI/MLhosting平台，旨在解开这个谜团。创新包括传统友好的存储层，以最小的复制移动数据，以及“快速路径”，将数据和计算并置，以最大限度地提高响应能力。我们的评估表明，级联在不损失吞吐量的情况下减少了几个数量级的延迟。

引用次数: 0

Trace-enabled Timing Model Synthesis for ROS2-based Autonomous Applications 基于ros2的自治应用的启用跟踪的定时模型合成

arXiv - CS - Operating Systems

Pub Date : 2023-11-22 DOI: arxiv-2311.13333

Hazem Abaza, Debayan Roy, Shiqing Fan, Selma Saidi, Antonios Motakis

Autonomous applications are typically developed over Robot Operating System2.0 (ROS2) even in time-critical systems like automotive. Recent years haveseen increased interest in developing model-based timing analysis and scheduleoptimization approaches for ROS2-based applications. To complement theseapproaches, we propose a tracing and measurement framework to emph{obtaintiming models} of ROS2-based applications. It offers a tracer based onemph{extended Berkeley Packet Filter} that probes different functions in ROS2middleware and reads their arguments or return values to reason about the dataflow in applications. It combines event traces from ROS2 and the operatingsystem to generate a emph{directed acyclic graph} showing ROS2 callbacks,precedence relations between them, and their timing attributes. While beingcompatible with existing analyses, we also show how to model (i)~messagesynchronization, e.g., in sensor fusion, and (ii)~service requests frommultiple clients, e.g., in motion planning. Considering that, in real-worldscenarios, the application code might be emph{confidential} and formal modelsare unavailable, our framework still enables the application of existinganalysis and optimization techniques.

自主应用程序通常是在机器人操作系统2.0 (ROS2)上开发的，即使在像汽车这样的时间关键系统中也是如此。近年来，人们对开发基于模型的时序分析和基于ros2的应用程序的调度优化方法越来越感兴趣。为了补充这些方法，我们提出了一个跟踪和测量框架来emph{获得基于ros2的应用程序的时序模型}。它提供了一个基于emph{扩展Berkeley包过滤器}的跟踪器，可以探测ros2中间件中的不同函数，并读取它们的参数或返回值来推断应用程序中的数据流。它将来自ROS2和操作系统的事件跟踪结合起来，生成一个有emph{向无循环图}，显示ROS2回调、它们之间的优先级关系以及它们的定时属性。在与现有分析兼容的同时，我们还展示了如何建模(i)消息同步，例如在传感器融合中，以及(ii)来自多个客户端的服务请求，例如在运动规划中。考虑到，在现实世界的场景中，应用程序代码可能是emph{保密}的，并且没有正式的模型，我们的框架仍然支持现有分析和优化技术的应用。

{"title":"Trace-enabled Timing Model Synthesis for ROS2-based Autonomous Applications","authors":"Hazem Abaza, Debayan Roy, Shiqing Fan, Selma Saidi, Antonios Motakis","doi":"arxiv-2311.13333","DOIUrl":"https://doi.org/arxiv-2311.13333","url":null,"abstract":"Autonomous applications are typically developed over Robot Operating System\u00002.0 (ROS2) even in time-critical systems like automotive. Recent years have\u0000seen increased interest in developing model-based timing analysis and schedule\u0000optimization approaches for ROS2-based applications. To complement these\u0000approaches, we propose a tracing and measurement framework to emph{obtain\u0000timing models} of ROS2-based applications. It offers a tracer based on\u0000emph{extended Berkeley Packet Filter} that probes different functions in ROS2\u0000middleware and reads their arguments or return values to reason about the data\u0000flow in applications. It combines event traces from ROS2 and the operating\u0000system to generate a emph{directed acyclic graph} showing ROS2 callbacks,\u0000precedence relations between them, and their timing attributes. While being\u0000compatible with existing analyses, we also show how to model (i)~message\u0000synchronization, e.g., in sensor fusion, and (ii)~service requests from\u0000multiple clients, e.g., in motion planning. Considering that, in real-world\u0000scenarios, the application code might be emph{confidential} and formal models\u0000are unavailable, our framework still enables the application of existing\u0000analysis and optimization techniques.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138523003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Memory Management Strategies for an Internet of Things System 物联网系统的内存管理策略

arXiv - CS - Operating Systems

Pub Date : 2023-11-17 DOI: arxiv-2311.10458

Ana-Maria Comeagă, Iuliana Marin

The rise of the Internet has brought about significant changes in our lives,and the rapid expansion of the Internet of Things (IoT) is poised to have aneven more substantial impact by connecting a wide range of devices acrossvarious application domains. IoT devices, especially low-end ones, areconstrained by limited memory and processing capabilities, necessitatingefficient memory management within IoT operating systems. This paper delvesinto the importance of memory management in IoT systems, with a primary focuson the design and configuration of such systems, as well as the scalability andperformance of scene management. Effective memory management is critical foroptimizing resource usage, responsiveness, and adaptability as the IoTecosystem continues to grow. The study offers insights into memory allocation,scene execution, memory reduction, and system scalability within the context ofan IoT system, ultimately highlighting the vital role that memory managementplays in facilitating a seamless and efficient IoT experience.

互联网的兴起给我们的生活带来了巨大的变化，而物联网(IoT)的快速扩张将通过连接各种应用领域的各种设备而产生更大的影响。物联网设备，特别是低端设备，受到有限的内存和处理能力的限制，需要在物联网操作系统中进行高效的内存管理。本文深入研究了物联网系统中内存管理的重要性，主要关注这些系统的设计和配置，以及场景管理的可扩展性和性能。随着IoTecosystem的不断发展，有效的内存管理对于优化资源使用、响应能力和适应性至关重要。该研究提供了对物联网系统背景下的内存分配、场景执行、内存减少和系统可扩展性的见解，最终强调了内存管理在促进无缝和高效的物联网体验方面所起的重要作用。

引用次数: 0

Telescope: Telemetry at Terabyte Scale 望远镜:tb级遥测技术

arXiv - CS - Operating Systems

Pub Date : 2023-11-17 DOI: arxiv-2311.10275

Alan Nair, Sandeep Kumar, Aravinda Prasad, Andy Rudoff, Sreenivas Subramoney

Data-hungry applications that require terabytes of memory have becomewidespread in recent years. To meet the memory needs of these applications,data centers are embracing tiered memory architectures with near and far memorytiers. Precise, efficient, and timely identification of hot and cold data andtheir placement in appropriate tiers is critical for performance in suchsystems. Unfortunately, the existing state-of-the-art telemetry techniques forhot and cold data detection are ineffective at the terabyte scale. We propose Telescope, a novel technique that profiles different levels of theapplication's page table tree for fast and efficient identification of hot andcold data. Telescope is based on the observation that, for a memory- andTLB-intensive workload, higher levels of a page table tree are also frequentlyaccessed during a hardware page table walk. Hence, the hotness of the higherlevels of the page table tree essentially captures the hotness of its subtreesor address space sub-regions at a coarser granularity. We exploit this insightto quickly converge on even a few megabytes of hot data and efficientlyidentify several gigabytes of cold data in terabyte-scale applications.Importantly, such a technique can seamlessly scale to petabyte-scaleapplications. Telescope's telemetry achieves 90%+ precision and recall at just 0.009%single CPU utilization for microbenchmarks with a 5 TB memory footprint. Memorytiering based on Telescope results in 5.6% to 34% throughput improvement forreal-world benchmarks with a 1-2 TB memory footprint compared to otherstate-of-the-art telemetry techniques.

近年来，需要tb内存的数据饥渴型应用程序变得非常普遍。为了满足这些应用程序的内存需求，数据中心正在采用具有近内存层和远内存层的分层内存架构。精确、高效、及时地识别热数据和冷数据，并将其放置在适当的层中，对于此类系统的性能至关重要。不幸的是，现有的最先进的遥测技术用于热数据和冷数据检测在太字节规模上是无效的。我们提出了Telescope，这是一种新颖的技术，它描述了应用程序页面表树的不同层次，以便快速有效地识别热数据和冷数据。Telescope基于以下观察:对于内存和tlb密集型工作负载，在硬件页表遍历期间也经常访问页表树的更高级别。因此，页表树较高层的热度实际上是以较粗粒度捕获其子树或地址空间子区域的热度。我们利用这种洞察力，快速地集中在几兆字节的热数据上，并在tb级应用程序中有效地识别几兆字节的冷数据。重要的是，这种技术可以无缝地扩展到pb级的应用程序。在5 TB内存占用的微基准测试中，Telescope的遥测技术实现了90%以上的精度和仅0.009%的单CPU利用率。与其他最先进的遥测技术相比，在1-2 TB内存占用的实际基准测试中，基于Telescope的内存分层使吞吐量提高了5.6%至34%。

{"title":"Telescope: Telemetry at Terabyte Scale","authors":"Alan Nair, Sandeep Kumar, Aravinda Prasad, Andy Rudoff, Sreenivas Subramoney","doi":"arxiv-2311.10275","DOIUrl":"https://doi.org/arxiv-2311.10275","url":null,"abstract":"Data-hungry applications that require terabytes of memory have become\u0000widespread in recent years. To meet the memory needs of these applications,\u0000data centers are embracing tiered memory architectures with near and far memory\u0000tiers. Precise, efficient, and timely identification of hot and cold data and\u0000their placement in appropriate tiers is critical for performance in such\u0000systems. Unfortunately, the existing state-of-the-art telemetry techniques for\u0000hot and cold data detection are ineffective at the terabyte scale. We propose Telescope, a novel technique that profiles different levels of the\u0000application's page table tree for fast and efficient identification of hot and\u0000cold data. Telescope is based on the observation that, for a memory- and\u0000TLB-intensive workload, higher levels of a page table tree are also frequently\u0000accessed during a hardware page table walk. Hence, the hotness of the higher\u0000levels of the page table tree essentially captures the hotness of its subtrees\u0000or address space sub-regions at a coarser granularity. We exploit this insight\u0000to quickly converge on even a few megabytes of hot data and efficiently\u0000identify several gigabytes of cold data in terabyte-scale applications.\u0000Importantly, such a technique can seamlessly scale to petabyte-scale\u0000applications. Telescope's telemetry achieves 90%+ precision and recall at just 0.009%\u0000single CPU utilization for microbenchmarks with a 5 TB memory footprint. Memory\u0000tiering based on Telescope results in 5.6% to 34% throughput improvement for\u0000real-world benchmarks with a 1-2 TB memory footprint compared to other\u0000state-of-the-art telemetry techniques.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138522994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Nahida: In-Band Distributed Tracing with eBPF Nahida:带内分布式跟踪与eBPF

arXiv - CS - Operating Systems

Pub Date : 2023-11-15 DOI: arxiv-2311.09032

Wanqi Yang, Pengfei Chen, Kai Liu, Huxing Zhang

Microservices are commonly used in modern cloud-native applications toachieve agility. However, the complexity of service dependencies in large-scalemicroservices systems can lead to anomaly propagation, making faulttroubleshooting a challenge. To address this issue, distributed tracing systemshave been proposed to trace complete request execution paths, enablingdevelopers to troubleshoot anomalous services. However, existing distributedtracing systems have limitations such as invasive instrumentation, trace loss,or inaccurate trace correlation. To overcome these limitations, we propose anew tracing system based on eBPF (extended Berkeley Packet Filter), namedNahida, that can track complete requests in the kernel without intrusion,regardless of programming language or implementation. Our evaluation resultsshow that Nahida can track over 92% of requests with stable accuracy, evenunder the high concurrency of user requests, while the state-of-the-artnon-invasive approaches can not track any of the requests. Importantly, Nahidacan track requests served by a multi-threaded application that none of theexisting invasive tracing systems can handle by instrumenting tracing codesinto libraries. Moreover, the overhead introduced by Nahida is negligible,increasing service latency by only 1.55%-2.1%. Overall, Nahida provides aneffective and non-invasive solution for distributed tracing.

微服务通常用于现代云原生应用程序，以实现敏捷性。然而，大规模微服务系统中服务依赖关系的复杂性可能导致异常传播，使故障排除成为一项挑战。为了解决这个问题，已经提出了分布式跟踪系统来跟踪完整的请求执行路径，使开发人员能够排除异常服务的故障。然而，现有的分布式跟踪系统存在局限性，例如侵入性仪器，跟踪丢失或不准确的跟踪相关性。为了克服这些限制，我们提出了一种新的基于eBPF(扩展伯克利包过滤器)的跟踪系统，命名为nahida，它可以在不入侵的情况下跟踪内核中的完整请求，无论编程语言或实现如何。我们的评估结果表明，即使在用户请求的高并发性下，Nahida也能以稳定的精度跟踪超过92%的请求，而目前最先进的非侵入性方法无法跟踪任何请求。重要的是，Nahidacan可以跟踪由多线程应用程序服务的请求，而现有的侵入性跟踪系统都无法通过将跟踪代码插入库来处理这些请求。此外，Nahida引入的开销可以忽略不计，只增加了1.55%-2.1%的服务延迟。总的来说，Nahida为分布式跟踪提供了一个有效且非侵入性的解决方案。

{"title":"Nahida: In-Band Distributed Tracing with eBPF","authors":"Wanqi Yang, Pengfei Chen, Kai Liu, Huxing Zhang","doi":"arxiv-2311.09032","DOIUrl":"https://doi.org/arxiv-2311.09032","url":null,"abstract":"Microservices are commonly used in modern cloud-native applications to\u0000achieve agility. However, the complexity of service dependencies in large-scale\u0000microservices systems can lead to anomaly propagation, making fault\u0000troubleshooting a challenge. To address this issue, distributed tracing systems\u0000have been proposed to trace complete request execution paths, enabling\u0000developers to troubleshoot anomalous services. However, existing distributed\u0000tracing systems have limitations such as invasive instrumentation, trace loss,\u0000or inaccurate trace correlation. To overcome these limitations, we propose a\u0000new tracing system based on eBPF (extended Berkeley Packet Filter), named\u0000Nahida, that can track complete requests in the kernel without intrusion,\u0000regardless of programming language or implementation. Our evaluation results\u0000show that Nahida can track over 92% of requests with stable accuracy, even\u0000under the high concurrency of user requests, while the state-of-the-art\u0000non-invasive approaches can not track any of the requests. Importantly, Nahida\u0000can track requests served by a multi-threaded application that none of the\u0000existing invasive tracing systems can handle by instrumenting tracing codes\u0000into libraries. Moreover, the overhead introduced by Nahida is negligible,\u0000increasing service latency by only 1.55%-2.1%. Overall, Nahida provides an\u0000effective and non-invasive solution for distributed tracing.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138522991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HAL 9000: Skynet's Risk Manager HAL 9000:天网的风险经理

arXiv - CS - Operating Systems

Pub Date : 2023-11-15 DOI: arxiv-2311.09449

Tadeu Freitas, Mário Neto, Inês Dutra, João Soares, Manuel Correia, Rolando Martins

Intrusion Tolerant Systems (ITSs) are a necessary component forcyber-services/infrastructures. Additionally, as cyberattacks follow amulti-domain attack surface, a similar defensive approach should be applied,namely, the use of an evolving multi-disciplinary solution that combines ITS,cybersecurity and Artificial Intelligence (AI). With the increased popularityof AI solutions, due to Big Data use-case scenarios and decision support andautomation scenarios, new opportunities to apply Machine Learning (ML)algorithms have emerged, namely ITS empowerment. Using ML algorithms, an ITScan augment its intrusion tolerance capability, by learning from previousattacks and from known vulnerabilities. As such, this work's contribution istwofold: (1) an ITS architecture (Skynet) based on the state-of-the-art andincorporates new components to increase its intrusion tolerance capability andits adaptability to new adversaries; (2) an improved Risk Manager design thatleverages AI to improve ITSs by automatically assessing OS risks to intrusions,and advise with safer configurations. One of the reasons that intrusions aresuccessful is due to bad configurations or slow adaptability to new threats.This can be caused by the dependency that systems have for human intervention.One of the characteristics in Skynet and HAL 9000 design is the removal ofhuman intervention. Being fully automatized lowers the chance of successfulintrusions caused by human error. Our experiments using Skynet, shows that HALis able to choose 15% safer configurations than the state-of-the-art riskmanager.

入侵容忍系统(ITSs)是网络服务/基础设施的必要组成部分。此外，由于网络攻击遵循多域攻击面，因此应采用类似的防御方法，即使用结合ITS，网络安全和人工智能(AI)的不断发展的多学科解决方案。随着人工智能解决方案的日益普及，由于大数据用例场景和决策支持和自动化场景，出现了应用机器学习(ML)算法的新机会，即ITS授权。使用ML算法，通过从以前的攻击和已知的漏洞中学习，ITScan增强了其入侵容忍能力。因此，这项工作的贡献是多方面的:(1)基于最先进的智能交通系统架构(天网)，并纳入了新的组件，以提高其入侵容忍能力和对新对手的适应性;(2)改进的风险管理器设计，利用人工智能通过自动评估操作系统对入侵的风险来改进ITSs，并建议更安全的配置。入侵成功的原因之一是由于错误的配置或对新威胁的缓慢适应。这可能是由于系统对人为干预的依赖性造成的。天网和HAL 9000的设计特点之一是消除了人为干预。完全自动化降低了人为错误导致入侵成功的机会。我们使用Skynet进行的实验表明，HALis能够选择比最先进的风险管理器安全15%的配置。

{"title":"HAL 9000: Skynet's Risk Manager","authors":"Tadeu Freitas, Mário Neto, Inês Dutra, João Soares, Manuel Correia, Rolando Martins","doi":"arxiv-2311.09449","DOIUrl":"https://doi.org/arxiv-2311.09449","url":null,"abstract":"Intrusion Tolerant Systems (ITSs) are a necessary component for\u0000cyber-services/infrastructures. Additionally, as cyberattacks follow a\u0000multi-domain attack surface, a similar defensive approach should be applied,\u0000namely, the use of an evolving multi-disciplinary solution that combines ITS,\u0000cybersecurity and Artificial Intelligence (AI). With the increased popularity\u0000of AI solutions, due to Big Data use-case scenarios and decision support and\u0000automation scenarios, new opportunities to apply Machine Learning (ML)\u0000algorithms have emerged, namely ITS empowerment. Using ML algorithms, an ITS\u0000can augment its intrusion tolerance capability, by learning from previous\u0000attacks and from known vulnerabilities. As such, this work's contribution is\u0000twofold: (1) an ITS architecture (Skynet) based on the state-of-the-art and\u0000incorporates new components to increase its intrusion tolerance capability and\u0000its adaptability to new adversaries; (2) an improved Risk Manager design that\u0000leverages AI to improve ITSs by automatically assessing OS risks to intrusions,\u0000and advise with safer configurations. One of the reasons that intrusions are\u0000successful is due to bad configurations or slow adaptability to new threats.\u0000This can be caused by the dependency that systems have for human intervention.\u0000One of the characteristics in Skynet and HAL 9000 design is the removal of\u0000human intervention. Being fully automatized lowers the chance of successful\u0000intrusions caused by human error. Our experiments using Skynet, shows that HAL\u0000is able to choose 15% safer configurations than the state-of-the-art risk\u0000manager.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138522987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

bpftime: userspace eBPF Runtime for Uprobe, Syscall and Kernel-User Interactions bpftime:用于Uprobe, sycall和内核用户交互的用户空间eBPF运行时

arXiv - CS - Operating Systems

Pub Date : 2023-11-14 DOI: arxiv-2311.07923

Yusheng Zheng, Tong Yu, Yiwei Yang, Yanpeng Hu, XiaoZheng Lai, Andrew Quinn

In kernel-centric operations, the uprobe component of eBPF frequentlyencounters performance bottlenecks, largely attributed to the overheads borneby context switches. Transitioning eBPF operations to user space bypasses thesehindrances, thereby optimizing performance. This also enhances configurabilityand obviates the necessity for root access or privileges for kernel eBPF,subsequently minimizing the kernel attack surface. This paper introducesbpftime, a novel user-space eBPF runtime, which leverages binary rewriting toimplement uprobe and syscall hook capabilities. Through bpftime, userspaceuprobes achieve a 10x speed enhancement compared to their kernel counterpartswithout requiring dual context switches. Additionally, this runtime facilitatesthe programmatic hooking of syscalls within a process, both safely andefficiently. Bpftime can be seamlessly attached to any running process,limiting the need for either a restart or manual recompilation. Ourimplementation also extends to interprocess eBPF Maps within shared memory,catering to summary aggregation or control plane communication requirements.Compatibility with existing eBPF toolchains such as clang and libbpf ismaintained, not only simplifying the development of user-space eBPF withoutnecessitating any modifications but also supporting CO-RE through BTF. Throughbpftime, we not only enhance uprobe performance but also extend the versatilityand user-friendliness of eBPF runtime in user space, paving the way for moreefficient and secure kernel operations.

在以内核为中心的操作中，eBPF的探测组件经常遇到性能瓶颈，这主要归因于上下文切换带来的开销。将eBPF操作转换到用户空间可以绕过这些障碍，从而优化性能。这还增强了可配置性，并消除了对内核eBPF的根访问或特权的必要性，从而最大限度地减少了内核攻击面。本文介绍了一种新的用户空间eBPF运行时bpftime，它利用二进制重写来实现探测和系统调用钩子功能。通过bpftime, userspaceuprobe实现了10倍的速度提升，而不需要双重上下文切换。此外，这个运行时还促进了进程内系统调用的编程挂钩，既安全又有效。Bpftime可以无缝地附加到任何正在运行的进程，从而限制了重新启动或手动重新编译的需要。我们的实现还扩展到共享内存中的进程间eBPF映射，以满足摘要聚合或控制平面通信需求。与现有的eBPF工具链(如clang和libbpf)保持兼容性，不仅简化了用户空间eBPF的开发而无需任何修改，而且还通过BTF支持CO-RE。通过bpftime，我们不仅增强了探针性能，还扩展了eBPF运行时在用户空间中的多功能性和用户友好性，为更高效、更安全的内核操作铺平了道路。

{"title":"bpftime: userspace eBPF Runtime for Uprobe, Syscall and Kernel-User Interactions","authors":"Yusheng Zheng, Tong Yu, Yiwei Yang, Yanpeng Hu, XiaoZheng Lai, Andrew Quinn","doi":"arxiv-2311.07923","DOIUrl":"https://doi.org/arxiv-2311.07923","url":null,"abstract":"In kernel-centric operations, the uprobe component of eBPF frequently\u0000encounters performance bottlenecks, largely attributed to the overheads borne\u0000by context switches. Transitioning eBPF operations to user space bypasses these\u0000hindrances, thereby optimizing performance. This also enhances configurability\u0000and obviates the necessity for root access or privileges for kernel eBPF,\u0000subsequently minimizing the kernel attack surface. This paper introduces\u0000bpftime, a novel user-space eBPF runtime, which leverages binary rewriting to\u0000implement uprobe and syscall hook capabilities. Through bpftime, userspace\u0000uprobes achieve a 10x speed enhancement compared to their kernel counterparts\u0000without requiring dual context switches. Additionally, this runtime facilitates\u0000the programmatic hooking of syscalls within a process, both safely and\u0000efficiently. Bpftime can be seamlessly attached to any running process,\u0000limiting the need for either a restart or manual recompilation. Our\u0000implementation also extends to interprocess eBPF Maps within shared memory,\u0000catering to summary aggregation or control plane communication requirements.\u0000Compatibility with existing eBPF toolchains such as clang and libbpf is\u0000maintained, not only simplifying the development of user-space eBPF without\u0000necessitating any modifications but also supporting CO-RE through BTF. Through\u0000bpftime, we not only enhance uprobe performance but also extend the versatility\u0000and user-friendliness of eBPF runtime in user space, paving the way for more\u0000efficient and secure kernel operations.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"495 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138523004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0