Operating Systems Review (ACM)最新文献

英文中文

Bringing Engineering Rigor to Deep Learning 将工程Rigor引入深度学习

Q3 Computer Science

Operating Systems Review (ACM)

Pub Date : 2019-07-25 DOI: 10.1145/3352020.3352030

Kexin Pei, Shiqi Wang, Yuchi Tian, J. Whitehouse, Carl Vondrick, Yinzhi Cao, Baishakhi Ray, S. Jana, Junfeng Yang

Deep learning (DL) systems are increasingly deployed in safety- and security-critical domains including autonomous driving, robotics, and malware detection, where the correctness and predictability of a system on corner-case inputs are of great importance. Unfortunately, the common practice to validating a deep neural network (DNN) - measuring overall accuracy on a randomly selected test set - is not designed to surface corner-case errors. As recent work shows, even DNNs with state-of-the-art accuracy are easily fooled by human-imperceptible, adversarial perturbations to the inputs. Questions such as how to test corner-case behaviors more thoroughly and whether all adversarial samples have been found remain unanswered. In the last few years, we have been working on bringing more engineering rigor into deep learning. Towards this goal, we have built five systems to test DNNs more thoroughly and verify the absence of adversarial samples for given datasets. These systems check a broad spectrum of properties (e.g., rotating an image should never change its classification) and find thousands of error-inducing samples for popular DNNs in critical domains (e.g., ImageNet, autonomous driving, and malware detection). Our DNN verifiers are also orders of magnitude (e.g., 5,000×) faster than similar tools. This article overviews our systems and discusses three open research challenges to hopefully inspire more future research towards testing and verifying DNNs.

深度学习（DL）系统越来越多地部署在安全和安全关键领域，包括自动驾驶、机器人和恶意软件检测，在这些领域，系统对角落案例输入的正确性和可预测性至关重要。不幸的是，验证深度神经网络（DNN）的常见做法——在随机选择的测试集上测量整体精度——并不是为了解决拐角情况下的误差。正如最近的工作所表明的，即使是具有最先进精度的DNN也很容易被人类对输入的难以察觉的对抗性扰动所欺骗。诸如如何更彻底地测试角落案例行为以及是否已经找到所有对抗性样本等问题仍然没有答案。在过去的几年里，我们一直致力于将更严格的工程技术引入深度学习。为了实现这一目标，我们建立了五个系统来更彻底地测试DNN，并验证给定数据集是否存在对抗性样本。这些系统检查广泛的属性（例如，旋转图像永远不应该改变其分类），并为关键领域（例如，ImageNet、自动驾驶和恶意软件检测）中的流行DNN找到数千个引起错误的样本。我们的DNN验证器也比类似工具快几个数量级（例如，5000倍）。本文概述了我们的系统，并讨论了三个开放的研究挑战，希望能激励未来更多的DNN测试和验证研究。

{"title":"Bringing Engineering Rigor to Deep Learning","authors":"Kexin Pei, Shiqi Wang, Yuchi Tian, J. Whitehouse, Carl Vondrick, Yinzhi Cao, Baishakhi Ray, S. Jana, Junfeng Yang","doi":"10.1145/3352020.3352030","DOIUrl":"https://doi.org/10.1145/3352020.3352030","url":null,"abstract":"Deep learning (DL) systems are increasingly deployed in safety- and security-critical domains including autonomous driving, robotics, and malware detection, where the correctness and predictability of a system on corner-case inputs are of great importance. Unfortunately, the common practice to validating a deep neural network (DNN) - measuring overall accuracy on a randomly selected test set - is not designed to surface corner-case errors. As recent work shows, even DNNs with state-of-the-art accuracy are easily fooled by human-imperceptible, adversarial perturbations to the inputs. Questions such as how to test corner-case behaviors more thoroughly and whether all adversarial samples have been found remain unanswered. In the last few years, we have been working on bringing more engineering rigor into deep learning. Towards this goal, we have built five systems to test DNNs more thoroughly and verify the absence of adversarial samples for given datasets. These systems check a broad spectrum of properties (e.g., rotating an image should never change its classification) and find thousands of error-inducing samples for popular DNNs in critical domains (e.g., ImageNet, autonomous driving, and malware detection). Our DNN verifiers are also orders of magnitude (e.g., 5,000×) faster than similar tools. This article overviews our systems and discusses three open research challenges to hopefully inspire more future research towards testing and verifying DNNs.","PeriodicalId":38935,"journal":{"name":"Operating Systems Review (ACM)","volume":"53 1","pages":"59 - 67"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3352020.3352030","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47345932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Speculative Symbolic Graph Execution of Imperative Deep Learning Programs 命令式深度学习程序的推测符号图执行

Q3 Computer Science

Operating Systems Review (ACM)

Pub Date : 2019-07-25 DOI: 10.1145/3352020.3352025

Eunji Jeong, Sungwoo Cho, Gyeong-In Yu, Joo Seong Jeong, Dongjin Shin, Taebum Kim, Byung-Gon Chun

The rapid evolution of deep neural networks is demanding deep learning (DL) frameworks not only to satisfy the requirement of quickly executing large computations, but also to support straightforward programming models for quickly implementing and experimenting with complex network structures. However, existing frameworks fail to excel in both departments simultaneously, leading to diverged efforts for optimizing performance and improving usability. This paper presents JANUS, a system that combines the advantages from both sides by transparently converting an imperative DL program written in Python, a de-facto scripting language for DL, into an efficiently executable symbolic dataflow graph. JANUS can convert various dynamic features of Python, including dynamic control flow, dynamic types, and impure functions, into elements of a symbolic dataflow graph. Our experiments show that JANUS can achieve fast DL training by exploiting the techniques imposed by symbolic graph-based DL frameworks, while maintaining the simple and flexible programmability of imperative DL frameworks at the same time.

深度神经网络的快速进化要求深度学习（DL）框架不仅要满足快速执行大型计算的要求，还要支持用于快速实现和实验复杂网络结构的直接编程模型。然而，现有的框架无法同时在这两个部门表现出色，导致在优化性能和提高可用性方面存在分歧。本文介绍了JANUS，该系统通过透明地将用Python编写的命令式DL程序（一种事实上的DL脚本语言）转换为可高效执行的符号数据流图，结合了双方的优势。JANUS可以将Python的各种动态特性（包括动态控制流、动态类型和不纯函数）转换为符号数据流图的元素。我们的实验表明，JANUS可以通过利用基于符号图的DL框架所施加的技术来实现快速DL训练，同时保持命令式DL框架的简单灵活的可编程性。

{"title":"Speculative Symbolic Graph Execution of Imperative Deep Learning Programs","authors":"Eunji Jeong, Sungwoo Cho, Gyeong-In Yu, Joo Seong Jeong, Dongjin Shin, Taebum Kim, Byung-Gon Chun","doi":"10.1145/3352020.3352025","DOIUrl":"https://doi.org/10.1145/3352020.3352025","url":null,"abstract":"The rapid evolution of deep neural networks is demanding deep learning (DL) frameworks not only to satisfy the requirement of quickly executing large computations, but also to support straightforward programming models for quickly implementing and experimenting with complex network structures. However, existing frameworks fail to excel in both departments simultaneously, leading to diverged efforts for optimizing performance and improving usability. This paper presents JANUS, a system that combines the advantages from both sides by transparently converting an imperative DL program written in Python, a de-facto scripting language for DL, into an efficiently executable symbolic dataflow graph. JANUS can convert various dynamic features of Python, including dynamic control flow, dynamic types, and impure functions, into elements of a symbolic dataflow graph. Our experiments show that JANUS can achieve fast DL training by exploiting the techniques imposed by symbolic graph-based DL frameworks, while maintaining the simple and flexible programmability of imperative DL frameworks at the same time.","PeriodicalId":38935,"journal":{"name":"Operating Systems Review (ACM)","volume":"53 1","pages":"26 - 33"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3352020.3352025","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43140122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

"Learned" “学习”

Q3 Computer Science

Operating Systems Review (ACM)

Pub Date : 2019-07-25 DOI: 10.1145/3352020.3352027

Yiying Zhang, Yutong Huang

With operating systems being at the core of computer systems, decades of research and engineering efforts have been put into the development of OSes. To keep pace with the speed of modern hardware and application evolvement, we argue that a different approach should be taken in future OS development. Instead of relying solely on human wisdom, we should also leverage AI and machine learning techniques to automatically "learn" how to build and tune an OS. This paper explores the opportunities and challenges of the "learned" OS approach and makes recommendation for future researchers and practitioners on building such an OS.

由于操作系统是计算机系统的核心，数十年的研究和工程努力已经投入到操作系统的开发中。为了跟上现代硬件和应用程序发展的速度，我们认为在未来的操作系统开发中应该采取不同的方法。我们不应该仅仅依靠人类的智慧，还应该利用人工智能和机器学习技术来自动“学习”如何构建和调整操作系统。本文探讨了“学习型”操作系统方法的机遇和挑战，并为未来的研究人员和实践者构建这样的操作系统提出了建议。

引用次数: 14

Artificial Intelligence in Resource-Constrained and Shared Environments 资源受限和共享环境中的人工智能

Q3 Computer Science

Operating Systems Review (ACM)

Pub Date : 2019-07-25 DOI: 10.1145/3352020.3352022

S. Krishnan, Aaron J. Elmore, M. Franklin, John Paparrizos, Zechao Shang, Adam Dziedzic, R. Liu

The computational demands of modern AI techniques are immense, and as the number of practical applications grows, there will be an increasing burden on shared computing infrastructure. We envision a forthcoming era of "AI Systems" research where reducing resource consumption, reasoning about transient resource availability, trading off resource consumption for accuracy, and managing contention on specialized hardware will become the community's main research focus. This paper overviews the history of AI systems research, a vision for the future, and the open challenges ahead.

现代人工智能技术的计算需求是巨大的，随着实际应用数量的增长，共享计算基础设施的负担将越来越大。我们设想了一个即将到来的“人工智能系统”研究时代，在这个时代，减少资源消耗、对瞬时资源可用性进行推理、用资源消耗换取准确性，以及管理专用硬件的竞争将成为社区的主要研究重点。本文概述了人工智能系统研究的历史、对未来的展望以及未来的挑战。

引用次数: 9

Cloud-Hosted Intelligence for Real-time IoT Applications 用于实时物联网应用的云托管智能

Q3 Computer Science

Operating Systems Review (ACM)

Pub Date : 2019-07-25 DOI: 10.1145/3352020.3352023

K. Birman, B. Hariharan, Christopher De Sa

Deploying machine learning into IoT cloud settings will require an evolution of the cloud infrastructure. In this white paper, we justify this assertion and identify new capabilities needed for real-time intelligent systems. We also outline our initial efforts to create a new edge architecture more suitable for ML. Although the work is still underway, several components exist, and we review them. We then point to open technical problems that will need to be solved as we progress further in this direction.

将机器学习部署到物联网云环境中需要云基础设施的发展。在这篇白皮书中，我们证明了这一论断的合理性，并确定了实时智能系统所需的新功能。我们还概述了我们为创建更适合ML的新边缘架构所做的初步努力。尽管这项工作仍在进行中，但仍有几个组件存在，我们对此进行了审查。然后，我们指出了随着我们朝着这个方向进一步发展，需要解决的悬而未决的技术问题。

引用次数: 7

Leveraging Deep Learning to Improve Performance Predictability in Cloud Microservices with Seer 利用深度学习提高云微服务的性能可预测性

Q3 Computer Science

Operating Systems Review (ACM)

Pub Date : 2019-07-25 DOI: 10.1145/3352020.3352026

Yu Gan, Yanqi Zhang, Kelvin Hu, Dailun Cheng, Yuan He, Meghna Pancholi, Christina Delimitrou

Performance unpredictability is a major roadblock towards cloud adoption, and has performance, cost, and revenue ramifications. Predictable performance is even more critical as cloud services transition from monolithic designs to microservices. Detecting UOS violations after they occur in systems with microservices results in long recovery times, as hotspots propagate and amplify across dependent services.

性能的不可预测性是采用云的主要障碍，并会对性能、成本和收入产生影响。随着云服务从单片设计过渡到微服务，可预测的性能变得更加重要。在使用微服务的系统中发生UOS违规行为后，检测这些违规行为会导致恢复时间过长，因为热点会在依赖服务之间传播和放大。

引用次数: 13

Taming Hyper-parameters in Deep Learning Systems 在深度学习系统中驯服超参数

Q3 Computer Science

Operating Systems Review (ACM)

Pub Date : 2019-07-25 DOI: 10.1145/3352020.3352029

Luo Mai, A. Koliousis, Guo Li, A. Brabete, P. Pietzuch

Deep learning (DL) systems expose many tuning parameters ("hyper-parameters") that affect the performance and accuracy of trained models. Increasingly users struggle to configure hyper-parameters, and a substantial portion of time is spent tuning them empirically. We argue that future DL systems should be designed to help manage hyper-parameters. We describe how a distributed DL system can (i) remove the impact of hyper-parameters on both performance and accuracy, thus making it easier to decide on a good setting, and (ii) support more powerful dynamic policies for adapting hyper-parameters, which take monitored training metrics into account. We report results from prototype implementations that show the practicality of DL system designs that are hyper-parameter-friendly.

深度学习(DL)系统暴露了许多调节参数(“超参数”)，这些参数会影响训练模型的性能和准确性。越来越多的用户难以配置超参数，并且大量的时间都花在了经验调优上。我们认为，未来的深度学习系统应该设计成帮助管理超参数。我们描述了分布式深度学习系统如何(i)消除超参数对性能和准确性的影响，从而使其更容易决定一个好的设置，以及(ii)支持更强大的动态策略来适应超参数，这些超参数将监控的训练指标考虑在内。我们报告了原型实现的结果，显示了超参数友好的DL系统设计的实用性。

引用次数: 17

When the Power of the Crowd Meets the Intelligence of the Middleware 当大众的力量与中间件的智能相遇

Q3 Computer Science

Operating Systems Review (ACM)

Pub Date : 2019-07-25 DOI: 10.1145/3352020.3352033

Yifan Du, V. Issarny, F. Sailhan

The data gluttony of AI is well known: Data fuels the artificial intelligence. Technologies that help to gather the needed data are then essential, among which the IoT. However, the deployment of IoT solutions raises significant challenges, especially regarding the resource and financial costs at stake. It is our view that mobile crowdsensing, aka phone sensing, has a major role to play because it potentially contributes massive data at a relatively low cost. Still, crowdsensing is useless, and even harmful, if the contributed data are not properly analyzed. This paper surveys our work on the development of systems facing this challenge, which also illustrates the virtuous circles of AI. We specifically focus on how intelligent crowdsensing middleware leverages on-device machine learning to enhance the reported physical observations. Keywords: Crowdsensing, Middleware, Online learning.

人工智能的数据过剩是众所周知的：数据为人工智能提供了燃料。有助于收集所需数据的技术是必不可少的，物联网就是其中之一。然而，物联网解决方案的部署带来了重大挑战，尤其是在资源和财务成本方面。我们认为，移动众包感知（也称为手机感知）发挥着重要作用，因为它可能以相对较低的成本提供大量数据。尽管如此，如果贡献的数据没有得到正确的分析，众筹是无用的，甚至是有害的。本文调查了我们在面临这一挑战的系统开发方面的工作，这也说明了人工智能的良性循环。我们特别关注智能众筹中间件如何利用设备机器学习来增强所报告的物理观测。关键词：群组感知，中间件，在线学习。

引用次数: 5

Morpheus 睡眠

Q3 Computer Science

Operating Systems Review (ACM)

Pub Date : 2018-08-28 DOI: 10.1145/3273982.3273989

Hung-Wei Tseng, Qianchen Zhao, Yuxiao Zhou, Mark Gahagan, S. Swanson

In modern computing systems, object deserialization can become a surprisingly important bottleneck-in our test, a set of generalpurpose, highly parallelized applications spends 64% of total execution time deserializing data into objects. This paper presents the Morpheus model, which allows applications to move such computations to a storage device and bypass the overhead on the host system. We use this model to deserialize data into application objects inside storage devices, rather than in the host CPU. Using the Morpheus model for object deserialization avoids unnecessary system overheads, frees up scarce CPU and main memory resources for compute-intensive workloads, saves I/O bandwidth, and reduces power consumption. In heterogeneous, coprocessor- equipped systems, Morpheus allows application objects to be sent directly from a storage device to a co-processor (e.g., a GPU) by peer-to-peer transfer, further improving application performance as well as reducing the CPU and main memory utilizations. This paper implements Morpheus-SSD, an SSD supporting the Morpheus model. Morpheus-SSD improves the performance of object deserialization by 1.66x, reduces power consumption by 7%, uses 42% less energy, and speeds up the total execution time by 1.32x. By using NVMe-P2P that realizes peer-to-peer communication between Morpheus-SSD and a GPU, Morpheus-SSD can speed up the total execution time by 1.39x in a heterogeneous computing platform.

在现代计算系统中，对象反序列化可能成为一个非常重要的瓶颈——在我们的测试中，一组通用的、高度并行化的应用程序花费了总执行时间的64%将数据反序列化为对象。本文提出了Morpheus模型，该模型允许应用程序将此类计算转移到存储设备上，并绕过主机系统上的开销。我们使用该模型将数据反序列化到存储设备内的应用程序对象中，而不是在主机CPU中。使用Morpheus模型进行对象反序列化可以避免不必要的系统开销，为计算密集型工作负载释放稀缺的CPU和主内存资源，节省I/O带宽，并降低功耗。在异构，协处理器装备的系统中，Morpheus允许应用程序对象通过点对点传输直接从存储设备发送到协处理器(例如，GPU)，进一步提高应用程序性能，并降低CPU和主内存的利用率。本文实现了一种支持Morpheus模型的SSD——Morpheus-SSD。Morpheus-SSD提高了1.66倍的对象反序列化性能，降低了7%的功耗，减少了42%的能耗，总执行时间提高了1.32倍。在异构计算平台中，Morpheus-SSD通过NVMe-P2P实现与GPU的点对点通信，使总执行时间提高1.39倍。

{"title":"Morpheus","authors":"Hung-Wei Tseng, Qianchen Zhao, Yuxiao Zhou, Mark Gahagan, S. Swanson","doi":"10.1145/3273982.3273989","DOIUrl":"https://doi.org/10.1145/3273982.3273989","url":null,"abstract":"In modern computing systems, object deserialization can become a surprisingly important bottleneck-in our test, a set of generalpurpose, highly parallelized applications spends 64% of total execution time deserializing data into objects. This paper presents the Morpheus model, which allows applications to move such computations to a storage device and bypass the overhead on the host system. We use this model to deserialize data into application objects inside storage devices, rather than in the host CPU. Using the Morpheus model for object deserialization avoids unnecessary system overheads, frees up scarce CPU and main memory resources for compute-intensive workloads, saves I/O bandwidth, and reduces power consumption. In heterogeneous, coprocessor- equipped systems, Morpheus allows application objects to be sent directly from a storage device to a co-processor (e.g., a GPU) by peer-to-peer transfer, further improving application performance as well as reducing the CPU and main memory utilizations. This paper implements Morpheus-SSD, an SSD supporting the Morpheus model. Morpheus-SSD improves the performance of object deserialization by 1.66x, reduces power consumption by 7%, uses 42% less energy, and speeds up the total execution time by 1.32x. By using NVMe-P2P that realizes peer-to-peer communication between Morpheus-SSD and a GPU, Morpheus-SSD can speed up the total execution time by 1.39x in a heterogeneous computing platform.","PeriodicalId":38935,"journal":{"name":"Operating Systems Review (ACM)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3273982.3273989","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64013000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

ActivePointers ActivePointers

Q3 Computer Science

Operating Systems Review (ACM)

Pub Date : 2018-08-28 DOI: 10.1145/3273982.3273990

Sagi Shahar, Shai Bergman, M. Silberstein

Modern discrete GPUs have been the processors of choice for accelerating compute-intensive applications, but using them in largescale data processing is extremely challenging. Unfortunately, they do not provide important I/O abstractions long established in the CPU context, such as memory mapped files, which shield programmers from the complexity of buffer and I/O device management. However, implementing these abstractions on GPUs poses a problem: the limited GPU virtual memory system provides no address space management and page fault handling mechanisms to GPU developers, and does not allow modifications to memory mappings for running GPU programs. We implement ActivePointers, a software address translation layer and paging system that introduces native support for page faults and virtual address space management to GPU programs, and enables the implementation of fully functional memory mapped files on commodity GPUs. Files mapped into GPU memory are accessed using active pointers, which behave like regular pointers but access the GPU page cache under the hood, and trigger page faults which are handled on the GPU. We design and evaluate a number of novel mechanisms, including a translation cache in hardware registers and translation aggregation for deadlock-free page fault handling of threads in a single warp. We extensively evaluate ActivePointers on commodity NVIDIA GPUs using microbenchmarks, and also implement a complex image processing application that constructs a photo collage from a subset of 10 million images stored in a 40GB file. The GPU implementation maps the entire file into GPU memory and accesses it via active pointers. The use of active pointers adds only up to 1% to the application's runtime, while enabling speedups of up to 3.9x over a combined CPU+GPU implementation and 2.6x over a 12-core CPU-only implementation which uses AVX vector instructions.

现代离散gpu已经成为加速计算密集型应用程序的首选处理器，但在大规模数据处理中使用它们是极具挑战性的。不幸的是，它们没有提供在CPU上下文中建立的重要的I/O抽象，例如内存映射文件，它使程序员避免了缓冲区和I/O设备管理的复杂性。然而，在GPU上实现这些抽象带来了一个问题:有限的GPU虚拟内存系统没有为GPU开发人员提供地址空间管理和页面错误处理机制，并且不允许修改运行GPU程序的内存映射。我们实现了ActivePointers，这是一个软件地址转换层和分页系统，它为GPU程序引入了对页面错误和虚拟地址空间管理的本地支持，并能够在商品GPU上实现全功能的内存映射文件。映射到GPU内存中的文件是使用活动指针访问的，它的行为像常规指针一样，但是访问GPU页面缓存的底层，并触发在GPU上处理的页面错误。我们设计和评估了一些新的机制，包括硬件寄存器中的翻译缓存和翻译聚合，用于在单个warp中处理线程的无死锁页面错误。我们使用微基准测试对NVIDIA商用gpu上的activepointer进行了广泛的评估，并且还实现了一个复杂的图像处理应用程序，该应用程序从存储在40GB文件中的1000万张图像子集中构建照片拼贴。GPU实现将整个文件映射到GPU内存并通过活动指针访问它。活动指针的使用只增加了应用程序运行时的1%，而在CPU+GPU的组合实现中，速度提高了3.9倍，在使用AVX矢量指令的12核CPU实现中，速度提高了2.6倍。

{"title":"ActivePointers","authors":"Sagi Shahar, Shai Bergman, M. Silberstein","doi":"10.1145/3273982.3273990","DOIUrl":"https://doi.org/10.1145/3273982.3273990","url":null,"abstract":"Modern discrete GPUs have been the processors of choice for accelerating compute-intensive applications, but using them in largescale data processing is extremely challenging. Unfortunately, they do not provide important I/O abstractions long established in the CPU context, such as memory mapped files, which shield programmers from the complexity of buffer and I/O device management. However, implementing these abstractions on GPUs poses a problem: the limited GPU virtual memory system provides no address space management and page fault handling mechanisms to GPU developers, and does not allow modifications to memory mappings for running GPU programs. We implement ActivePointers, a software address translation layer and paging system that introduces native support for page faults and virtual address space management to GPU programs, and enables the implementation of fully functional memory mapped files on commodity GPUs. Files mapped into GPU memory are accessed using active pointers, which behave like regular pointers but access the GPU page cache under the hood, and trigger page faults which are handled on the GPU. We design and evaluate a number of novel mechanisms, including a translation cache in hardware registers and translation aggregation for deadlock-free page fault handling of threads in a single warp. We extensively evaluate ActivePointers on commodity NVIDIA GPUs using microbenchmarks, and also implement a complex image processing application that constructs a photo collage from a subset of 10 million images stored in a 40GB file. The GPU implementation maps the entire file into GPU memory and accesses it via active pointers. The use of active pointers adds only up to 1% to the application's runtime, while enabling speedups of up to 3.9x over a combined CPU+GPU implementation and 2.6x over a 12-core CPU-only implementation which uses AVX vector instructions.","PeriodicalId":38935,"journal":{"name":"Operating Systems Review (ACM)","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78407299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 39

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Operating Systems Review (ACM)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀