Vittorio Orbinato, Marco Carlo Feliciano, Domenico Cotroneo, Roberto Natella
Advanced Persistent Threats (APTs) represent the most threatening form of attack nowadays since they can stay undetected for a long time. Adversary emulation is a proactive approach for preparing against these attacks. However, adversary emulation tools lack the anti-detection abilities of APTs. We introduce Laccolith, a hypervisor-based solution for adversary emulation with anti-detection to fill this gap. We also present an experimental study to compare Laccolith with MITRE CALDERA, a state-of-the-art solution for adversary emulation, against five popular anti-virus products. We found that CALDERA cannot evade detection, limiting the realism of emulated attacks, even when combined with a state-of-the-art anti-detection framework. Our experiments show that Laccolith can hide its activities from all the tested anti-virus products, thus making it suitable for realistic emulations.
{"title":"Laccolith: Hypervisor-Based Adversary Emulation with Anti-Detection","authors":"Vittorio Orbinato, Marco Carlo Feliciano, Domenico Cotroneo, Roberto Natella","doi":"arxiv-2311.08274","DOIUrl":"https://doi.org/arxiv-2311.08274","url":null,"abstract":"Advanced Persistent Threats (APTs) represent the most threatening form of\u0000attack nowadays since they can stay undetected for a long time. Adversary\u0000emulation is a proactive approach for preparing against these attacks. However,\u0000adversary emulation tools lack the anti-detection abilities of APTs. We\u0000introduce Laccolith, a hypervisor-based solution for adversary emulation with\u0000anti-detection to fill this gap. We also present an experimental study to\u0000compare Laccolith with MITRE CALDERA, a state-of-the-art solution for adversary\u0000emulation, against five popular anti-virus products. We found that CALDERA\u0000cannot evade detection, limiting the realism of emulated attacks, even when\u0000combined with a state-of-the-art anti-detection framework. Our experiments show\u0000that Laccolith can hide its activities from all the tested anti-virus products,\u0000thus making it suitable for realistic emulations.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"154 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138522993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohsen Karimi, Yidi Wang, Youngbin Kim, Yoojin Lim, Hyoseung Kim
This paper presents CARTOS, a charging-aware real-time operating system designed to enhance the functionality of intermittently-powered batteryless devices (IPDs) for various Internet of Things (IoT) applications. While IPDs offer significant advantages such as extended lifespan and operability in extreme environments, they pose unique challenges, including the need to ensure forward progress of program execution amidst variable energy availability and maintaining reliable real-time time behavior during power disruptions. To address these challenges, CARTOS introduces a mixed-preemption scheduling model that classifies tasks into computational and peripheral tasks, and ensures their efficient and timely execution by adopting just-in-time checkpointing for divisible computation tasks and uninterrupted execution for indivisible peripheral tasks. CARTOS also supports processing chains of tasks with precedence constraints and adapts its scheduling in response to environmental changes to offer continuous execution under diverse conditions. CARTOS is implemented with new APIs and components added to FreeRTOS but is designed for portability to other embedded RTOSs. Through real hardware experiments and simulations, CARTOS exhibits superior performance over state-of-the-art methods, demonstrating that it can serve as a practical platform for developing resilient, real-time sensing applications on IPDs.
{"title":"CARTOS: A Charging-Aware Real-Time Operating System for Intermittent Batteryless Devices","authors":"Mohsen Karimi, Yidi Wang, Youngbin Kim, Yoojin Lim, Hyoseung Kim","doi":"arxiv-2311.07227","DOIUrl":"https://doi.org/arxiv-2311.07227","url":null,"abstract":"This paper presents CARTOS, a charging-aware real-time operating system\u0000designed to enhance the functionality of intermittently-powered batteryless\u0000devices (IPDs) for various Internet of Things (IoT) applications. While IPDs\u0000offer significant advantages such as extended lifespan and operability in\u0000extreme environments, they pose unique challenges, including the need to ensure\u0000forward progress of program execution amidst variable energy availability and\u0000maintaining reliable real-time time behavior during power disruptions. To\u0000address these challenges, CARTOS introduces a mixed-preemption scheduling model\u0000that classifies tasks into computational and peripheral tasks, and ensures\u0000their efficient and timely execution by adopting just-in-time checkpointing for\u0000divisible computation tasks and uninterrupted execution for indivisible\u0000peripheral tasks. CARTOS also supports processing chains of tasks with\u0000precedence constraints and adapts its scheduling in response to environmental\u0000changes to offer continuous execution under diverse conditions. CARTOS is\u0000implemented with new APIs and components added to FreeRTOS but is designed for\u0000portability to other embedded RTOSs. Through real hardware experiments and\u0000simulations, CARTOS exhibits superior performance over state-of-the-art\u0000methods, demonstrating that it can serve as a practical platform for developing\u0000resilient, real-time sensing applications on IPDs.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138522999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The seL4 microkernel is currently the only kernel that has been fully formally verified. In general, the increased interest in ensuring the security of a kernel's code results from its important role in the entire operating system. One of the basic features of an operating system is that it abstracts the handling of devices. This abstraction is represented by device drivers - the software that manages the hardware. A proper verification of the software component could ensure that the device would work properly unless there is a hardware failure.In this paper, we choose to model the behavior of a device driver and build the proof that the code implementation matches the expected behavior. The proof was written in Isabelle/HOL, the code translation from C to Isabelle was done automatically by the use of the C-to-Isabelle Parser and AutoCorres tools. We choose Isabelle theorem prover because its efficiency was already shown through the verification of seL4 microkernel.
{"title":"OpenBSD formal driver verification with SeL4","authors":"Adriana Nicolae, Paul Irofti, Ioana Leustean","doi":"arxiv-2311.03585","DOIUrl":"https://doi.org/arxiv-2311.03585","url":null,"abstract":"The seL4 microkernel is currently the only kernel that has been fully\u0000formally verified. In general, the increased interest in ensuring the security\u0000of a kernel's code results from its important role in the entire operating\u0000system. One of the basic features of an operating system is that it abstracts\u0000the handling of devices. This abstraction is represented by device drivers -\u0000the software that manages the hardware. A proper verification of the software\u0000component could ensure that the device would work properly unless there is a\u0000hardware failure.In this paper, we choose to model the behavior of a device\u0000driver and build the proof that the code implementation matches the expected\u0000behavior. The proof was written in Isabelle/HOL, the code translation from C to\u0000Isabelle was done automatically by the use of the C-to-Isabelle Parser and\u0000AutoCorres tools. We choose Isabelle theorem prover because its efficiency was\u0000already shown through the verification of seL4 microkernel.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"64 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138522988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present here a reverse engineering tool that can be used for information retrieval and anti-malware techniques. Our main contribution is the design and implementation of an instrumentation framework aimed at providing insight on the emulation process. Sample emulation is achieved via translation of the binary code to an intermediate representation followed by compilation and execution. The design makes this a versatile tool that can be used for multiple task such as information retrieval, reverse engineering, debugging, and integration with anti-malware products.
{"title":"Pinky: A Modern Malware-oriented Dynamic Information Retrieval Tool","authors":"Paul Irofti","doi":"arxiv-2311.03588","DOIUrl":"https://doi.org/arxiv-2311.03588","url":null,"abstract":"We present here a reverse engineering tool that can be used for information\u0000retrieval and anti-malware techniques. Our main contribution is the design and\u0000implementation of an instrumentation framework aimed at providing insight on\u0000the emulation process. Sample emulation is achieved via translation of the\u0000binary code to an intermediate representation followed by compilation and\u0000execution. The design makes this a versatile tool that can be used for multiple\u0000task such as information retrieval, reverse engineering, debugging, and\u0000integration with anti-malware products.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138522997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bodun Hu, Le Xu, Jeongyoon Moon, Neeraja J. Yadwadkar, Aditya Akella
Rapid advancements over the years have helped machine learning models reach previously hard-to-achieve goals, sometimes even exceeding human capabilities. However, to attain the desired accuracy, the model sizes and in turn their computational requirements have increased drastically. Thus, serving predictions from these models to meet any target latency and cost requirements of applications remains a key challenge, despite recent work in building inference-serving systems as well as algorithmic approaches that dynamically adapt models based on inputs. In this paper, we introduce a form of dynamism, modality selection, where we adaptively choose modalities from inference inputs while maintaining the model quality. We introduce MOSEL, an automated inference serving system for multi-modal ML models that carefully picks input modalities per request based on user-defined performance and accuracy requirements. MOSEL exploits modality configurations extensively, improving system throughput by 3.6$times$ with an accuracy guarantee and shortening job completion times by 11$times$.
{"title":"MOSEL: Inference Serving Using Dynamic Modality Selection","authors":"Bodun Hu, Le Xu, Jeongyoon Moon, Neeraja J. Yadwadkar, Aditya Akella","doi":"arxiv-2310.18481","DOIUrl":"https://doi.org/arxiv-2310.18481","url":null,"abstract":"Rapid advancements over the years have helped machine learning models reach\u0000previously hard-to-achieve goals, sometimes even exceeding human capabilities.\u0000However, to attain the desired accuracy, the model sizes and in turn their\u0000computational requirements have increased drastically. Thus, serving\u0000predictions from these models to meet any target latency and cost requirements\u0000of applications remains a key challenge, despite recent work in building\u0000inference-serving systems as well as algorithmic approaches that dynamically\u0000adapt models based on inputs. In this paper, we introduce a form of dynamism,\u0000modality selection, where we adaptively choose modalities from inference inputs\u0000while maintaining the model quality. We introduce MOSEL, an automated inference\u0000serving system for multi-modal ML models that carefully picks input modalities\u0000per request based on user-defined performance and accuracy requirements. MOSEL\u0000exploits modality configurations extensively, improving system throughput by\u00003.6$times$ with an accuracy guarantee and shortening job completion times by\u000011$times$.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"2 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138522996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Internet of Things (IoT) is becoming an integral part of our modern lives as we converge towards a world surrounded by ubiquitous connectivity. The inherent complexity presented by the vast IoT ecosystem ends up in an insufficient understanding of individual system components and their interactions, leading to numerous security challenges. In order to create a secure IoT platform from the ground up, there is a need for a unifying operating system (OS) that can act as a cornerstone regulating the development of stable and secure solutions. In this paper, we present a classification of the security challenges stemming from the manifold aspects of IoT development. We also specify security requirements to direct the secure development of an unifying IoT OS to resolve many of those ensuing challenges. Survey of several modern IoT OSs confirm that while the developers of the OSs have taken many alternative approaches to implement security, we are far from engineering an adequately secure and unified architecture. More broadly, the study presented in this paper can help address the growing need for a secure and unified platform to base IoT development on and assure the safe, secure, and reliable operation of IoT in critical domains.
{"title":"A Survey of the Security Challenges and Requirements for IoT Operating Systems","authors":"Alvi Jawad","doi":"arxiv-2310.19825","DOIUrl":"https://doi.org/arxiv-2310.19825","url":null,"abstract":"The Internet of Things (IoT) is becoming an integral part of our modern lives\u0000as we converge towards a world surrounded by ubiquitous connectivity. The\u0000inherent complexity presented by the vast IoT ecosystem ends up in an\u0000insufficient understanding of individual system components and their\u0000interactions, leading to numerous security challenges. In order to create a\u0000secure IoT platform from the ground up, there is a need for a unifying\u0000operating system (OS) that can act as a cornerstone regulating the development\u0000of stable and secure solutions. In this paper, we present a classification of\u0000the security challenges stemming from the manifold aspects of IoT development.\u0000We also specify security requirements to direct the secure development of an\u0000unifying IoT OS to resolve many of those ensuing challenges. Survey of several\u0000modern IoT OSs confirm that while the developers of the OSs have taken many\u0000alternative approaches to implement security, we are far from engineering an\u0000adequately secure and unified architecture. More broadly, the study presented\u0000in this paper can help address the growing need for a secure and unified\u0000platform to base IoT development on and assure the safe, secure, and reliable\u0000operation of IoT in critical domains.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"2 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138522998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scott BuckleyUNSW Sydney, Robert SisonUNSW SydneyUniversity of Melbourne, Nils WistoffETH Zürich, Curtis MillarUNSW Sydney, Toby MurrayUniversity of Melbourne, Gerwin KleinProofcraftUNSW Sydney, Gernot HeiserUNSW Sydney
Microarchitectural timing channels are a major threat to computer security. A set of OS mechanisms called time protection was recently proposed as a principled way of preventing information leakage through such channels and prototyped in the seL4 microkernel. We formalise time protection and the underlying hardware mechanisms in a way that allows linking them to the information-flow proofs that showed the absence of storage channels in seL4.
{"title":"Proving the Absence of Microarchitectural Timing Channels","authors":"Scott BuckleyUNSW Sydney, Robert SisonUNSW SydneyUniversity of Melbourne, Nils WistoffETH Zürich, Curtis MillarUNSW Sydney, Toby MurrayUniversity of Melbourne, Gerwin KleinProofcraftUNSW Sydney, Gernot HeiserUNSW Sydney","doi":"arxiv-2310.17046","DOIUrl":"https://doi.org/arxiv-2310.17046","url":null,"abstract":"Microarchitectural timing channels are a major threat to computer security. A\u0000set of OS mechanisms called time protection was recently proposed as a\u0000principled way of preventing information leakage through such channels and\u0000prototyped in the seL4 microkernel. We formalise time protection and the\u0000underlying hardware mechanisms in a way that allows linking them to the\u0000information-flow proofs that showed the absence of storage channels in seL4.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"51 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138523000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Suyash Mahar, Mingyao Shen, Terence Kelly, Steven Swanson
Crash consistency using persistent memory programming libraries requires programmers to use complex transactions and manual annotations. In contrast, the failure-atomic msync() (FAMS) interface is much simpler as it transparently tracks updates and guarantees that modified data is atomically durable on a call to the failure-atomic variant of msync(). However, FAMS suffers from several drawbacks, like the overhead of msync() and the write amplification from page-level dirty data tracking. To address these drawbacks while preserving the advantages of FAMS, we propose Snapshot, an efficient userspace implementation of FAMS. Snapshot uses compiler-based annotation to transparently track updates in userspace and syncs them with the backing byte-addressable storage copy on a call to msync(). By keeping a copy of application data in DRAM, Snapshot improves access latency. Moreover, with automatic tracking and syncing changes only on a call to msync(), Snapshot provides crash-consistency guarantees, unlike the POSIX msync() system call. For a KV-Store backed by Intel Optane running the YCSB benchmark, Snapshot achieves at least 1.2$times$ speedup over PMDK while significantly outperforming conventional (non-crash-consistent) msync(). On an emulated CXL memory semantic SSD, Snapshot outperforms PMDK by up to 10.9$times$ on all but one YCSB workload, where PMDK is 1.2$times$ faster than Snapshot. Further, Kyoto Cabinet commits perform up to 8.0$times$ faster with Snapshot than its built-in, msync()-based crash-consistency mechanism.
{"title":"Snapshot: Fast, Userspace Crash Consistency for CXL and PM Using msync","authors":"Suyash Mahar, Mingyao Shen, Terence Kelly, Steven Swanson","doi":"arxiv-2310.16300","DOIUrl":"https://doi.org/arxiv-2310.16300","url":null,"abstract":"Crash consistency using persistent memory programming libraries requires\u0000programmers to use complex transactions and manual annotations. In contrast,\u0000the failure-atomic msync() (FAMS) interface is much simpler as it transparently\u0000tracks updates and guarantees that modified data is atomically durable on a\u0000call to the failure-atomic variant of msync(). However, FAMS suffers from\u0000several drawbacks, like the overhead of msync() and the write amplification\u0000from page-level dirty data tracking. To address these drawbacks while preserving the advantages of FAMS, we\u0000propose Snapshot, an efficient userspace implementation of FAMS. Snapshot uses compiler-based annotation to transparently track updates in\u0000userspace and syncs them with the backing byte-addressable storage copy on a\u0000call to msync(). By keeping a copy of application data in DRAM, Snapshot\u0000improves access latency. Moreover, with automatic tracking and syncing changes\u0000only on a call to msync(), Snapshot provides crash-consistency guarantees,\u0000unlike the POSIX msync() system call. For a KV-Store backed by Intel Optane running the YCSB benchmark, Snapshot\u0000achieves at least 1.2$times$ speedup over PMDK while significantly\u0000outperforming conventional (non-crash-consistent) msync(). On an emulated CXL\u0000memory semantic SSD, Snapshot outperforms PMDK by up to 10.9$times$ on all but\u0000one YCSB workload, where PMDK is 1.2$times$ faster than Snapshot. Further,\u0000Kyoto Cabinet commits perform up to 8.0$times$ faster with Snapshot than its\u0000built-in, msync()-based crash-consistency mechanism.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"322 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138523001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yecheng Yang, Pu Pang, Jiawen Wang, Quan Chen, Minyi Guo
The technologies of heterogeneous multi-core architectures, co-location, and virtualization can be used to reduce server power consumption and improve system utilization, which are three important technologies for data centers. This article explores the scheduling strategy of Emulator threads within virtual machine processes in a scenario of co-location of multiple virtual machines on heterogeneous multi-core architectures. In this co-location scenario, the scheduling strategy for Emulator threads significantly affects the performance of virtual machines. This article focuses on this thread for the first time in the relevant field. This article found that the scheduling latency metric can well indicate the running status of the vCPU threads and Emulator threads in the virtualization environment, and applied this metric to the design of the scheduling strategy. This article designed an Emulator thread scheduler based on heuristic rules, which, in coordination with the host operating system's scheduler, dynamically adjusts the scheduling scope of Emulator threads to improve the overall performance of virtual machines. The article found that in real application scenarios, the scheduler effectively improved the performance of applications within virtual machines, with a maximum performance improvement of 40.7%.
{"title":"Adaptive CPU Resource Allocation for Emulator in Kernel-based Virtual Machine","authors":"Yecheng Yang, Pu Pang, Jiawen Wang, Quan Chen, Minyi Guo","doi":"arxiv-2310.14741","DOIUrl":"https://doi.org/arxiv-2310.14741","url":null,"abstract":"The technologies of heterogeneous multi-core architectures, co-location, and\u0000virtualization can be used to reduce server power consumption and improve\u0000system utilization, which are three important technologies for data centers.\u0000This article explores the scheduling strategy of Emulator threads within\u0000virtual machine processes in a scenario of co-location of multiple virtual\u0000machines on heterogeneous multi-core architectures. In this co-location\u0000scenario, the scheduling strategy for Emulator threads significantly affects\u0000the performance of virtual machines. This article focuses on this thread for\u0000the first time in the relevant field. This article found that the scheduling\u0000latency metric can well indicate the running status of the vCPU threads and\u0000Emulator threads in the virtualization environment, and applied this metric to\u0000the design of the scheduling strategy. This article designed an Emulator thread\u0000scheduler based on heuristic rules, which, in coordination with the host\u0000operating system's scheduler, dynamically adjusts the scheduling scope of\u0000Emulator threads to improve the overall performance of virtual machines. The\u0000article found that in real application scenarios, the scheduler effectively\u0000improved the performance of applications within virtual machines, with a\u0000maximum performance improvement of 40.7%.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"14 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138523092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents GMEM, generalized memory management, for peripheral devices. GMEM provides OS support for centralized memory management of both CPU and devices. GMEM provides a high-level interface that decouples MMU-specific functions. Device drivers can thus attach themselves to a process's address space and let the OS take charge of their memory management. This eliminates the need for device drivers to "reinvent the wheel" and allows them to benefit from general memory optimizations integrated by GMEM. Furthermore, GMEM internally coordinates all attached devices within each virtual address space. This drastically improves user-level programmability, since programmers can use a single address space within their program, even when operating across the CPU and multiple devices. A case study on device drivers demonstrates these benefits. A GMEM-based IOMMU driver eliminates around seven hundred lines of code and obtains 54% higher network receive throughput utilizing 32% less CPU compared to the state-of-the-art. In addition, the GMEM-based driver of a simulated GPU takes less than 70 lines of code, excluding its MMU functions.
{"title":"GMEM: Generalized Memory Management for Peripheral Devices","authors":"Weixi Zhu, Alan L. Cox, Scott Rixner","doi":"arxiv-2310.12554","DOIUrl":"https://doi.org/arxiv-2310.12554","url":null,"abstract":"This paper presents GMEM, generalized memory management, for peripheral\u0000devices. GMEM provides OS support for centralized memory management of both CPU\u0000and devices. GMEM provides a high-level interface that decouples MMU-specific\u0000functions. Device drivers can thus attach themselves to a process's address\u0000space and let the OS take charge of their memory management. This eliminates\u0000the need for device drivers to \"reinvent the wheel\" and allows them to benefit\u0000from general memory optimizations integrated by GMEM. Furthermore, GMEM\u0000internally coordinates all attached devices within each virtual address space.\u0000This drastically improves user-level programmability, since programmers can use\u0000a single address space within their program, even when operating across the CPU\u0000and multiple devices. A case study on device drivers demonstrates these\u0000benefits. A GMEM-based IOMMU driver eliminates around seven hundred lines of\u0000code and obtains 54% higher network receive throughput utilizing 32% less CPU\u0000compared to the state-of-the-art. In addition, the GMEM-based driver of a\u0000simulated GPU takes less than 70 lines of code, excluding its MMU functions.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"64 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138522986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}