Soo Yee Lim, Sidhartha Agrawal, Xueyuan Han, David Eyers, Dan O'Keeffe, Thomas Pasquier
Monolithic operating systems, where all kernel functionality resides in a single, shared address space, are the foundation of most mainstream computer systems. However, a single flaw, even in a non-essential part of the kernel (e.g., device drivers), can cause the entire operating system to fall under an attacker's control. Kernel hardening techniques might prevent certain types of vulnerabilities, but they fail to address a fundamental weakness: the lack of intra-kernel security that safely isolates different parts of the kernel. We survey kernel compartmentalization techniques that define and enforce intra-kernel boundaries and propose a taxonomy that allows the community to compare and discuss future work. We also identify factors that complicate comparisons among compartmentalized systems, suggest new ways to compare future approaches with existing work meaningfully, and discuss emerging research directions.
{"title":"Securing Monolithic Kernels using Compartmentalization","authors":"Soo Yee Lim, Sidhartha Agrawal, Xueyuan Han, David Eyers, Dan O'Keeffe, Thomas Pasquier","doi":"arxiv-2404.08716","DOIUrl":"https://doi.org/arxiv-2404.08716","url":null,"abstract":"Monolithic operating systems, where all kernel functionality resides in a\u0000single, shared address space, are the foundation of most mainstream computer\u0000systems. However, a single flaw, even in a non-essential part of the kernel\u0000(e.g., device drivers), can cause the entire operating system to fall under an\u0000attacker's control. Kernel hardening techniques might prevent certain types of\u0000vulnerabilities, but they fail to address a fundamental weakness: the lack of\u0000intra-kernel security that safely isolates different parts of the kernel. We\u0000survey kernel compartmentalization techniques that define and enforce\u0000intra-kernel boundaries and propose a taxonomy that allows the community to\u0000compare and discuss future work. We also identify factors that complicate\u0000comparisons among compartmentalized systems, suggest new ways to compare future\u0000approaches with existing work meaningfully, and discuss emerging research\u0000directions.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"298 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140592002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the current digital security ecosystem, where threats evolve rapidly and with complexity, companies developing Endpoint Detection and Response (EDR) solutions are in constant search for innovations that not only keep up but also anticipate emerging attack vectors. In this context, this article introduces the HookChain, a look from another perspective at widely known techniques, which when combined, provide an additional layer of sophisticated evasion against traditional EDR systems. Through a precise combination of IAT Hooking techniques, dynamic SSN resolution, and indirect system calls, HookChain redirects the execution flow of Windows subsystems in a way that remains invisible to the vigilant eyes of EDRs that only act on Ntdll.dll, without requiring changes to the source code of the applications and malwares involved. This work not only challenges current conventions in cybersecurity but also sheds light on a promising path for future protection strategies, leveraging the understanding that continuous evolution is key to the effectiveness of digital security. By developing and exploring the HookChain technique, this study significantly contributes to the body of knowledge in endpoint security, stimulating the development of more robust and adaptive solutions that can effectively address the ever-changing dynamics of digital threats. This work aspires to inspire deep reflection and advancement in the research and development of security technologies that are always several steps ahead of adversaries.
{"title":"HookChain: A new perspective for Bypassing EDR Solutions","authors":"Helvio Carvalho Junior","doi":"arxiv-2404.16856","DOIUrl":"https://doi.org/arxiv-2404.16856","url":null,"abstract":"In the current digital security ecosystem, where threats evolve rapidly and\u0000with complexity, companies developing Endpoint Detection and Response (EDR)\u0000solutions are in constant search for innovations that not only keep up but also\u0000anticipate emerging attack vectors. In this context, this article introduces\u0000the HookChain, a look from another perspective at widely known techniques,\u0000which when combined, provide an additional layer of sophisticated evasion\u0000against traditional EDR systems. Through a precise combination of IAT Hooking\u0000techniques, dynamic SSN resolution, and indirect system calls, HookChain\u0000redirects the execution flow of Windows subsystems in a way that remains\u0000invisible to the vigilant eyes of EDRs that only act on Ntdll.dll, without\u0000requiring changes to the source code of the applications and malwares involved.\u0000This work not only challenges current conventions in cybersecurity but also\u0000sheds light on a promising path for future protection strategies, leveraging\u0000the understanding that continuous evolution is key to the effectiveness of\u0000digital security. By developing and exploring the HookChain technique, this\u0000study significantly contributes to the body of knowledge in endpoint security,\u0000stimulating the development of more robust and adaptive solutions that can\u0000effectively address the ever-changing dynamics of digital threats. This work\u0000aspires to inspire deep reflection and advancement in the research and\u0000development of security technologies that are always several steps ahead of\u0000adversaries.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140811374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sunita Jain, Nagaradhesh Yeleswarapu, Hasan Al Maruf, Rita Gupta
Compute Express Link (CXL) is a rapidly emerging coherent interconnect standard that provides opportunities for memory pooling and sharing. Memory sharing is a well-established software feature that improves memory utilization by avoiding unnecessary data movement. In this paper, we discuss multiple approaches to enable memory sharing with different generations of CXL protocol (i.e., CXL 2.0 and CXL 3.0) considering the challenges with each of the architectures from the device hardware and software viewpoint.
{"title":"Memory Sharing with CXL: Hardware and Software Design Approaches","authors":"Sunita Jain, Nagaradhesh Yeleswarapu, Hasan Al Maruf, Rita Gupta","doi":"arxiv-2404.03245","DOIUrl":"https://doi.org/arxiv-2404.03245","url":null,"abstract":"Compute Express Link (CXL) is a rapidly emerging coherent interconnect\u0000standard that provides opportunities for memory pooling and sharing. Memory\u0000sharing is a well-established software feature that improves memory utilization\u0000by avoiding unnecessary data movement. In this paper, we discuss multiple\u0000approaches to enable memory sharing with different generations of CXL protocol\u0000(i.e., CXL 2.0 and CXL 3.0) considering the challenges with each of the\u0000architectures from the device hardware and software viewpoint.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"68 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140591886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emre Karabulut, Arsalan Ali Malik, Amro Awad, Aydin Aysu
Using correct design metrics and understanding the limitations of the underlying technology is critical to developing effective scheduling algorithms. Unfortunately, existing scheduling techniques used emph{incorrect} metrics and had emph{unrealistic} assumptions for fair scheduling of multi-tenant FPGAs where each tenant is aimed to share approximately the same number of resources both spatially and temporally. This paper introduces an enhanced fair scheduling algorithm for multi-tenant FPGA use, addressing previous metric and assumption issues, with three specific improvements claimed First, our method ensures spatiotemporal fairness by considering both spatial and temporal aspects, addressing the limitation of prior work that assumed uniform task latency. Second, we incorporate energy considerations into fairness by adjusting scheduling intervals and accounting for energy overhead, thereby balancing energy efficiency with fairness. Third, we acknowledge overlooked aspects of FPGA multi-tenancy, including heterogeneous regions and the constraints on dynamically merging/splitting partially reconfigurable regions. We develop and evaluate our improved fair scheduling algorithm with these three enhancements. Inspired by the Greek goddess of law and personification of justice, we name our fair scheduling solution THEMIS: underline{T}ime, underline{H}eterogeneity, and underline{E}nergy underline{Mi}nded underline{S}cheduling. We used the Xilinx Zedboard XC7Z020 to quantify our approach's savings. Compared to previous algorithms, our improved scheduling algorithm enhances fairness between 24.2--98.4% and allows a trade-off between 55.3$times$ in energy vs. 69.3$times$ in fairness. The paper thus informs cloud providers about future scheduling optimizations for fairness with related challenges and opportunities.
{"title":"THEMIS: Time, Heterogeneity, and Energy Minded Scheduling for Fair Multi-Tenant Use in FPGAs","authors":"Emre Karabulut, Arsalan Ali Malik, Amro Awad, Aydin Aysu","doi":"arxiv-2404.00507","DOIUrl":"https://doi.org/arxiv-2404.00507","url":null,"abstract":"Using correct design metrics and understanding the limitations of the\u0000underlying technology is critical to developing effective scheduling\u0000algorithms. Unfortunately, existing scheduling techniques used emph{incorrect}\u0000metrics and had emph{unrealistic} assumptions for fair scheduling of\u0000multi-tenant FPGAs where each tenant is aimed to share approximately the same\u0000number of resources both spatially and temporally. This paper introduces an enhanced fair scheduling algorithm for multi-tenant\u0000FPGA use, addressing previous metric and assumption issues, with three specific\u0000improvements claimed First, our method ensures spatiotemporal fairness by\u0000considering both spatial and temporal aspects, addressing the limitation of\u0000prior work that assumed uniform task latency. Second, we incorporate energy\u0000considerations into fairness by adjusting scheduling intervals and accounting\u0000for energy overhead, thereby balancing energy efficiency with fairness. Third,\u0000we acknowledge overlooked aspects of FPGA multi-tenancy, including\u0000heterogeneous regions and the constraints on dynamically merging/splitting\u0000partially reconfigurable regions. We develop and evaluate our improved fair\u0000scheduling algorithm with these three enhancements. Inspired by the Greek\u0000goddess of law and personification of justice, we name our fair scheduling\u0000solution THEMIS: underline{T}ime, underline{H}eterogeneity, and\u0000underline{E}nergy underline{Mi}nded underline{S}cheduling. We used the Xilinx Zedboard XC7Z020 to quantify our approach's savings.\u0000Compared to previous algorithms, our improved scheduling algorithm enhances\u0000fairness between 24.2--98.4% and allows a trade-off between 55.3$times$ in\u0000energy vs. 69.3$times$ in fairness. The paper thus informs cloud providers\u0000about future scheduling optimizations for fairness with related challenges and\u0000opportunities.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140591890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Operating systems (OSes) are foundational to computer systems, managing hardware resources and ensuring secure environments for diverse applications. However, despite their enduring importance, the fundamental design objectives of OSes have seen minimal evolution over decades. Traditionally prioritizing aspects like speed, memory efficiency, security, and scalability, these objectives often overlook the crucial aspect of intelligence as well as personalized user experience. The lack of intelligence becomes increasingly critical amid technological revolutions, such as the remarkable advancements in machine learning (ML). Today's personal devices, evolving into intimate companions for users, pose unique challenges for traditional OSes like Linux and iOS, especially with the emergence of specialized hardware featuring heterogeneous components. Furthermore, the rise of large language models (LLMs) in ML has introduced transformative capabilities, reshaping user interactions and software development paradigms. While existing literature predominantly focuses on leveraging ML methods for system optimization or accelerating ML workloads, there is a significant gap in addressing personalized user experiences at the OS level. To tackle this challenge, this work proposes PerOS, a personalized OS ingrained with LLM capabilities. PerOS aims to provide tailored user experiences while safeguarding privacy and personal data through declarative interfaces, self-adaptive kernels, and secure data management in a scalable cloud-centric architecture; therein lies the main research question of this work: How can we develop intelligent, secure, and scalable OSes that deliver personalized experiences to thousands of users?
操作系统(OS)是计算机系统的基础,它可以管理硬件资源,确保为各种应用提供安全的环境。然而,尽管其重要性经久不衰,但几十年来,操作系统的基本设计目标却鲜有变化。传统上,这些目标优先考虑速度、内存效率、安全性和可扩展性等方面,但往往忽略了智能化和个性化用户体验等重要方面。在技术革命(如机器学习(ML)的显著进步)的背景下,智能的缺失变得越来越关键。如今的个人设备已发展成为用户的亲密伙伴,对 Linux 和 iOS 等传统操作系统提出了独特的挑战,特别是随着具有异构组件的专用硬件的出现。此外,ML 中大型语言模型(LLM)的兴起引入了变革能力,重塑了用户交互和软件开发范式。虽然现有文献主要关注利用 ML 方法进行系统优化或加速 ML 工作负载,但在操作系统层面解决个性化用户体验方面还存在巨大差距。为了应对这一挑战,本研究提出了具有 LLM 能力的个性化操作系统 PerOS。PerOS旨在提供量身定制的用户体验,同时通过声明式界面、自适应内核和以云为中心的可扩展架构中的安全数据管理来保护隐私和个人数据:我们如何才能开发出智能、安全、可扩展的操作系统,为成千上万的用户提供个性化体验?
{"title":"PerOS: Personalized Self-Adapting Operating Systems in the Cloud","authors":"Hongyu Hè","doi":"arxiv-2404.00057","DOIUrl":"https://doi.org/arxiv-2404.00057","url":null,"abstract":"Operating systems (OSes) are foundational to computer systems, managing\u0000hardware resources and ensuring secure environments for diverse applications.\u0000However, despite their enduring importance, the fundamental design objectives\u0000of OSes have seen minimal evolution over decades. Traditionally prioritizing\u0000aspects like speed, memory efficiency, security, and scalability, these\u0000objectives often overlook the crucial aspect of intelligence as well as\u0000personalized user experience. The lack of intelligence becomes increasingly\u0000critical amid technological revolutions, such as the remarkable advancements in\u0000machine learning (ML). Today's personal devices, evolving into intimate companions for users, pose\u0000unique challenges for traditional OSes like Linux and iOS, especially with the\u0000emergence of specialized hardware featuring heterogeneous components.\u0000Furthermore, the rise of large language models (LLMs) in ML has introduced\u0000transformative capabilities, reshaping user interactions and software\u0000development paradigms. While existing literature predominantly focuses on leveraging ML methods for\u0000system optimization or accelerating ML workloads, there is a significant gap in\u0000addressing personalized user experiences at the OS level. To tackle this\u0000challenge, this work proposes PerOS, a personalized OS ingrained with LLM\u0000capabilities. PerOS aims to provide tailored user experiences while\u0000safeguarding privacy and personal data through declarative interfaces,\u0000self-adaptive kernels, and secure data management in a scalable cloud-centric\u0000architecture; therein lies the main research question of this work: How can we\u0000develop intelligent, secure, and scalable OSes that deliver personalized\u0000experiences to thousands of users?","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"298 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140591767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arastoo Bozorgi, Mahya Soleimani Jadidi, Jonathan Anderson
Strong confidentiality, integrity, user control, reliability and performance are critical requirements in privacy-sensitive applications. Such applications would benefit from a data storage and sharing infrastructure that provides these properties even in decentralized topologies with untrusted storage backends, but users today are forced to choose between systemic security properties and system reliability or performance. As an alternative to this status quo we present UPSS: the user-centric private sharing system, a cryptographic storage system that can be used as a conventional filesystem or as the foundation for security-sensitive applications such as redaction with integrity and private revision control. We demonstrate that both the security and performance properties of UPSS exceed that of existing cryptographic filesystems and that its performance is comparable to mature conventional filesystems - in some cases, even superior. Whether used directly via its Rust API or as a conventional filesystem, UPSS provides strong security and practical performance on untrusted storage.
{"title":"UPSS: a User-centric Private Storage System with its applications","authors":"Arastoo Bozorgi, Mahya Soleimani Jadidi, Jonathan Anderson","doi":"arxiv-2403.15884","DOIUrl":"https://doi.org/arxiv-2403.15884","url":null,"abstract":"Strong confidentiality, integrity, user control, reliability and performance\u0000are critical requirements in privacy-sensitive applications. Such applications\u0000would benefit from a data storage and sharing infrastructure that provides\u0000these properties even in decentralized topologies with untrusted storage\u0000backends, but users today are forced to choose between systemic security\u0000properties and system reliability or performance. As an alternative to this\u0000status quo we present UPSS: the user-centric private sharing system, a\u0000cryptographic storage system that can be used as a conventional filesystem or\u0000as the foundation for security-sensitive applications such as redaction with\u0000integrity and private revision control. We demonstrate that both the security\u0000and performance properties of UPSS exceed that of existing cryptographic\u0000filesystems and that its performance is comparable to mature conventional\u0000filesystems - in some cases, even superior. Whether used directly via its Rust\u0000API or as a conventional filesystem, UPSS provides strong security and\u0000practical performance on untrusted storage.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"233 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140298325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wangsong Yin, Mengwei Xu, Yuanchun Li, Xuanzhe Liu
Being more powerful and intrusive into user-device interactions, LLMs are eager for on-device execution to better preserve user privacy. In this work, we propose a new paradigm of mobile AI: LLM as a system service on mobile devices (LLMaaS). Unlike traditional DNNs that execute in a stateless manner, such a system service is stateful: LLMs execution often needs to maintain persistent states (mainly KV cache) across multiple invocations. To minimize the LLM context switching overhead under tight device memory budget, this work presents LLMS, which decouples the memory management of app and LLM contexts with a key idea of fine-grained, chunk-wise, globally-optimized KV cache compression and swapping. By fully leveraging KV cache's unique characteristics, it proposes three novel techniques: (1) Tolerance-Aware Compression: it compresses chunks based on their measured accuracy tolerance to compression. (2) IO-Recompute Pipelined Loading: it introduces recompute to swapping-in for acceleration. (3) Chunk Lifecycle Management: it optimizes the memory activities of chunks with an ahead-of-time swapping-out and an LCTRU (Least Compression-Tolerable and Recently-Used) queue based eviction. In evaluations conducted on well-established traces and various edge devices, sys reduces context switching latency by up to 2 orders of magnitude when compared to competitive baseline solutions.
{"title":"LLM as a System Service on Mobile Devices","authors":"Wangsong Yin, Mengwei Xu, Yuanchun Li, Xuanzhe Liu","doi":"arxiv-2403.11805","DOIUrl":"https://doi.org/arxiv-2403.11805","url":null,"abstract":"Being more powerful and intrusive into user-device interactions, LLMs are\u0000eager for on-device execution to better preserve user privacy. In this work, we\u0000propose a new paradigm of mobile AI: LLM as a system service on mobile devices\u0000(LLMaaS). Unlike traditional DNNs that execute in a stateless manner, such a\u0000system service is stateful: LLMs execution often needs to maintain persistent\u0000states (mainly KV cache) across multiple invocations. To minimize the LLM\u0000context switching overhead under tight device memory budget, this work presents\u0000LLMS, which decouples the memory management of app and LLM contexts with a key\u0000idea of fine-grained, chunk-wise, globally-optimized KV cache compression and\u0000swapping. By fully leveraging KV cache's unique characteristics, it proposes\u0000three novel techniques: (1) Tolerance-Aware Compression: it compresses chunks\u0000based on their measured accuracy tolerance to compression. (2) IO-Recompute\u0000Pipelined Loading: it introduces recompute to swapping-in for acceleration. (3)\u0000Chunk Lifecycle Management: it optimizes the memory activities of chunks with\u0000an ahead-of-time swapping-out and an LCTRU (Least Compression-Tolerable and\u0000Recently-Used) queue based eviction. In evaluations conducted on\u0000well-established traces and various edge devices, sys reduces context\u0000switching latency by up to 2 orders of magnitude when compared to competitive\u0000baseline solutions.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"147 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140169333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The growing value of data as a strategic asset has given rise to the necessity of implementing reliable backup and recovery solutions in the most efficient and cost-effective manner. The data backup methods available today on linux are not effective enough, because while running, most of them block I/Os to guarantee data integrity. We propose and implement Next4 - file system based snapshot feature in Ext4 which creates an instant image of the file system, to provide incremental versions of data, enabling reliable backup and data recovery. In our design, the snapshot feature is implemented by efficiently infusing the copy-on-write strategy in the write-in-place, extent based Ext4 file system, without affecting its basic structure. Each snapshot is an incremental backup of the data within the system. What distinguishes Next4 is the way that the data is backed up, improving both space utilization as well as performance.
数据作为一种战略资产的价值越来越大,因此有必要以最高效、最经济的方式实施可靠的备份和恢复解决方案。目前 Linux 上可用的数据备份方法不够有效,因为大多数方法在运行时都会阻塞 I/O,以保证数据的完整性。我们提出并实现了 Next4--Ext4 中基于文件系统的快照功能,它可以创建文件系统的即时镜像,提供增量版本的数据,从而实现可靠的备份和数据恢复。在我们的设计中,快照功能是通过在不影响其基本结构的情况下,在基于程度的就地写入 Ext4 文件系统中有效地注入写时复制策略来实现的。每个快照都是系统内数据的递增备份。Next4 的与众不同之处在于数据备份的方式,既提高了空间利用率,又提高了性能。
{"title":"Next4: Snapshots in Ext4 File System","authors":"Aditya Dani, Shardul Mangade, Piyush Nimbalkar, Harshad Shirwadkar","doi":"arxiv-2403.06790","DOIUrl":"https://doi.org/arxiv-2403.06790","url":null,"abstract":"The growing value of data as a strategic asset has given rise to the\u0000necessity of implementing reliable backup and recovery solutions in the most\u0000efficient and cost-effective manner. The data backup methods available today on\u0000linux are not effective enough, because while running, most of them block I/Os\u0000to guarantee data integrity. We propose and implement Next4 - file system based\u0000snapshot feature in Ext4 which creates an instant image of the file system, to\u0000provide incremental versions of data, enabling reliable backup and data\u0000recovery. In our design, the snapshot feature is implemented by efficiently\u0000infusing the copy-on-write strategy in the write-in-place, extent based Ext4\u0000file system, without affecting its basic structure. Each snapshot is an\u0000incremental backup of the data within the system. What distinguishes Next4 is\u0000the way that the data is backed up, improving both space utilization as well as\u0000performance.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140106450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Byte-addressable non-volatile memory (NVM) sitting on the memory bus is employed to make persistent memory (PMem) in general-purpose computing systems and embedded systems for data storage. Researchers develop software drivers such as the block translation table (BTT) to build block devices on PMem, so programmers can keep using mature and reliable conventional storage stack while expecting high performance by exploiting fast PMem. However, our quantitative study shows that BTT underutilizes PMem and yields inferior performance, due to the absence of the imperative in-device cache. We add a conventional I/O staging cache made of DRAM space to BTT. As DRAM and PMem have comparable access latency, I/O staging cache is likely to be fully filled over time. Continual cache evictions and fsyncs thus cause on-demand flushes with severe stalls, such that the I/O staging cache is concretely unappealing for PMem-based block devices. We accordingly propose an algorithm named Caiti with novel I/O transit caching. Caiti eagerly evicts buffered data to PMem through CPU's multi-cores. It also conditionally bypasses a full cache and directly writes data into PMem to further alleviate I/O stalls. Experiments confirm that Caiti significantly boosts the performance with BTT by up to 3.6x, without loss of block-level write atomicity.
{"title":"I/O Transit Caching for PMem-based Block Device","authors":"Qing Xu, Qisheng Jiang, Chundong Wang","doi":"arxiv-2403.06120","DOIUrl":"https://doi.org/arxiv-2403.06120","url":null,"abstract":"Byte-addressable non-volatile memory (NVM) sitting on the memory bus is\u0000employed to make persistent memory (PMem) in general-purpose computing systems\u0000and embedded systems for data storage. Researchers develop software drivers\u0000such as the block translation table (BTT) to build block devices on PMem, so\u0000programmers can keep using mature and reliable conventional storage stack while\u0000expecting high performance by exploiting fast PMem. However, our quantitative\u0000study shows that BTT underutilizes PMem and yields inferior performance, due to\u0000the absence of the imperative in-device cache. We add a conventional I/O\u0000staging cache made of DRAM space to BTT. As DRAM and PMem have comparable\u0000access latency, I/O staging cache is likely to be fully filled over time.\u0000Continual cache evictions and fsyncs thus cause on-demand flushes with severe\u0000stalls, such that the I/O staging cache is concretely unappealing for\u0000PMem-based block devices. We accordingly propose an algorithm named Caiti with\u0000novel I/O transit caching. Caiti eagerly evicts buffered data to PMem through\u0000CPU's multi-cores. It also conditionally bypasses a full cache and directly\u0000writes data into PMem to further alleviate I/O stalls. Experiments confirm that\u0000Caiti significantly boosts the performance with BTT by up to 3.6x, without loss\u0000of block-level write atomicity.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"2016 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140106471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Virtual memory is a cornerstone of modern computing systems.Introduced as one of the earliest instances of hardware-software co-design, VM facilitates programmer-transparent memory man agement, data sharing, process isolation and memory protection. Evaluating the efficiency of various virtual memory (VM) designs is crucial (i) given their significant impact on the system, including the CPU caches, the main memory, and the storage device and (ii) given that different system architectures might benefit from various VM techniques. Such an evaluation is not straightforward, as it heavily hinges on modeling the interplay between different VM techniques and the interactions of VM with the system architecture. Modern simulators, however, struggle to keep up with the rapid VM research developments, lacking the capability to model a wide range of contemporary VM techniques and their interactions. To this end, we present Virtuoso, an open-source, comprehensive and modular simulation framework that models various VM designs to establish a common ground for virtual memory research. We demonstrate the versatility and the potential of Virtuoso with four new case studies. Virtuoso is freely open-source and can be found at https://github.com/CMU-SAFARI/Virtuoso.
虚拟内存是现代计算系统的基石。作为硬件-软件协同设计的最早实例之一,虚拟内存为程序员透明的内存管理、数据共享、进程隔离和内存保护提供了便利。评估各种虚拟内存(VM)设计的效率至关重要:(i) 因为它们对系统(包括 CPU 高速缓存、主存储器和存储设备)有重大影响;(ii) 因为不同的系统架构可能受益于各种虚拟内存技术。这样的评估并不简单,因为它在很大程度上取决于对不同虚拟机技术之间的相互作用以及虚拟机与系统架构之间的相互作用进行建模。然而,现代模拟器难以跟上虚拟机研究的快速发展,缺乏对各种当代虚拟机技术及其交互进行建模的能力。为此,我们提出了一个开源、全面和模块化的仿真框架--Virtuoso,它可以模拟各种虚拟机设计,为虚拟内存研究建立一个共同基础。我们通过四个新案例研究展示了 Virtuoso 的多功能性和潜力。Virtuoso免费开源,可在https://github.com/CMU-SAFARI/Virtuoso。
{"title":"Virtuoso: An Open-Source, Comprehensive and Modular Simulation Framework for Virtual Memory Research","authors":"Konstantinos Kanellopoulos, Konstantinos Sgouras, Onur Mutlu","doi":"arxiv-2403.04635","DOIUrl":"https://doi.org/arxiv-2403.04635","url":null,"abstract":"Virtual memory is a cornerstone of modern computing systems.Introduced as one\u0000of the earliest instances of hardware-software co-design, VM facilitates\u0000programmer-transparent memory man agement, data sharing, process isolation and\u0000memory protection. Evaluating the efficiency of various virtual memory (VM)\u0000designs is crucial (i) given their significant impact on the system, including\u0000the CPU caches, the main memory, and the storage device and (ii) given that\u0000different system architectures might benefit from various VM techniques. Such\u0000an evaluation is not straightforward, as it heavily hinges on modeling the\u0000interplay between different VM techniques and the interactions of VM with the\u0000system architecture. Modern simulators, however, struggle to keep up with the\u0000rapid VM research developments, lacking the capability to model a wide range of\u0000contemporary VM techniques and their interactions. To this end, we present\u0000Virtuoso, an open-source, comprehensive and modular simulation framework that\u0000models various VM designs to establish a common ground for virtual memory\u0000research. We demonstrate the versatility and the potential of Virtuoso with\u0000four new case studies. Virtuoso is freely open-source and can be found at\u0000https://github.com/CMU-SAFARI/Virtuoso.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140070467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}