首页 > 最新文献

ACM Sigplan Notices最新文献

英文 中文
Minnow 小鱼
Q1 Computer Science Pub Date : 2018-11-30 DOI: 10.1145/3296957.3173197
Dan Zhang, Xiaoyu Ma, Michael Thomson, Derek Chiou
The importance of irregular applications such as graph analytics is rapidly growing with the rise of Big Data. However, parallel graph workloads tend to perform poorly on general-purpose chip multiprocessors (CMPs) due to poor cache locality, low compute intensity, frequent synchronization, uneven task sizes, and dynamic task generation. At high thread counts, execution time is dominated by worklist synchronization overhead and cache misses. Researchers have proposed hardware worklist accelerators to address scheduling costs, but these proposals often harden a specific scheduling policy and do not address high cache miss rates. We address this with Minnow, a technique that augments each core in a CMP with a lightweight Minnow accelerator. Minnow engines offload worklist scheduling from worker threads to improve scalability. The engines also perform worklist-directed prefetching, a technique that exploits knowledge of upcoming tasks to issue nearly perfectly accurate and timely prefetch operations. On a simulated 64-core CMP running a parallel graph benchmark suite, Minnow improves scalability and reduces L2 cache misses from 29 to 1.2 MPKI on average, resulting in 6.01x average speedup over an optimized software baseline for only 1% area overhead.
随着大数据的兴起,图形分析等非常规应用的重要性正在迅速增长。然而,由于缓存局部性差、计算强度低、频繁同步、任务大小不均匀和动态任务生成,并行图工作负载在通用芯片多处理器(cmp)上的性能往往很差。在线程数较高的情况下,执行时间主要由工作列表同步开销和缓存丢失决定。研究人员提出了硬件工作列表加速器来解决调度成本问题,但这些建议通常会强化特定的调度策略,并不能解决高缓存丢失率问题。我们用Minnow解决了这个问题,这是一种用轻量级Minnow加速器增强CMP中的每个核心的技术。Minnow引擎从工作线程中卸载工作列表调度以提高可伸缩性。引擎还执行面向工作列表的预取,这是一种利用即将到来的任务的知识来发出几乎完全准确和及时的预取操作的技术。在运行并行图基准测试套件的模拟64核CMP上,Minnow提高了可伸缩性,并将L2缓存丢失从平均29 MPKI减少到1.2 MPKI,从而在优化软件基线上平均加速6.01倍,而面积开销仅为1%。
{"title":"Minnow","authors":"Dan Zhang, Xiaoyu Ma, Michael Thomson, Derek Chiou","doi":"10.1145/3296957.3173197","DOIUrl":"https://doi.org/10.1145/3296957.3173197","url":null,"abstract":"The importance of irregular applications such as graph analytics is rapidly growing with the rise of Big Data. However, parallel graph workloads tend to perform poorly on general-purpose chip multiprocessors (CMPs) due to poor cache locality, low compute intensity, frequent synchronization, uneven task sizes, and dynamic task generation. At high thread counts, execution time is dominated by worklist synchronization overhead and cache misses. Researchers have proposed hardware worklist accelerators to address scheduling costs, but these proposals often harden a specific scheduling policy and do not address high cache miss rates. We address this with Minnow, a technique that augments each core in a CMP with a lightweight Minnow accelerator. Minnow engines offload worklist scheduling from worker threads to improve scalability. The engines also perform worklist-directed prefetching, a technique that exploits knowledge of upcoming tasks to issue nearly perfectly accurate and timely prefetch operations. On a simulated 64-core CMP running a parallel graph benchmark suite, Minnow improves scalability and reduces L2 cache misses from 29 to 1.2 MPKI on average, resulting in 6.01x average speedup over an optimized software baseline for only 1% area overhead.","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79994433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Potluck 家常便饭
Q1 Computer Science Pub Date : 2018-11-30 DOI: 10.1145/3296957.3173185
Peizhen Guo, Wenjun Hu
Emerging mobile applications, such as cognitive assistance and augmented reality (AR) based gaming, are increasingly computation-intensive and latency-sensitive, while running on resource-constrained devices. The standard approaches to addressing these involve either offloading to a cloud(let) or local system optimizations to speed up the computation, often trading off computation quality for low latency. Instead, we observe that these applications often operate on similar input data from the camera feed and share common processing components, both within the same (type of) applications and across different ones. Therefore, deduplicating processing across applications could deliver the best of both worlds. In this paper, we present Potluck, to achieve approximate deduplication. At the core of the system is a cache service that stores and shares processing results between applications and a set of algorithms to process the input data to maximize deduplication opportunities. This is implemented as a background service on Android. Extensive evaluation shows that Potluck can reduce the processing latency for our AR and vision workloads by a factor of 2.5 to 10.
新兴的移动应用程序,如基于认知辅助和增强现实(AR)的游戏,在资源有限的设备上运行时,越来越需要计算密集型和延迟敏感。解决这些问题的标准方法包括卸载到云(let)或本地系统优化以加速计算,通常以计算质量为代价换取低延迟。相反,我们观察到这些应用程序通常对来自摄像头馈电的类似输入数据进行操作,并在同一(类型)应用程序内和不同应用程序之间共享公共处理组件。因此,跨应用程序进行重复数据删除处理可以提供两全其美的效果。在本文中,我们提出了Potluck,以实现近似重复数据删除。该系统的核心是一个缓存服务,用于存储和共享应用程序之间的处理结果,以及一组处理输入数据的算法,以最大限度地提高重复数据删除的机会。这是在Android上作为后台服务实现的。广泛的评估表明,Potluck可以将我们的AR和视觉工作负载的处理延迟减少2.5到10倍。
{"title":"Potluck","authors":"Peizhen Guo, Wenjun Hu","doi":"10.1145/3296957.3173185","DOIUrl":"https://doi.org/10.1145/3296957.3173185","url":null,"abstract":"\u0000 Emerging mobile applications, such as cognitive assistance and augmented reality (AR) based gaming, are increasingly computation-intensive and latency-sensitive, while running on resource-constrained devices. The standard approaches to addressing these involve either offloading to a cloud(let) or local system optimizations to speed up the computation, often trading off computation quality for low latency. Instead, we observe that these applications often operate on similar input data from the camera feed and share common processing components, both within the same (type of) applications and across different ones. Therefore, deduplicating processing across applications could deliver the best of both worlds. In this paper, we present Potluck, to achieve\u0000 approximate deduplication.\u0000 At the core of the system is a cache service that stores and shares processing results between applications and a set of algorithms to process the input data to maximize deduplication opportunities. This is implemented as a background service on Android. Extensive evaluation shows that Potluck can reduce the processing latency for our AR and vision workloads by a factor of 2.5 to 10.\u0000","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88783286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
NEOFog
Q1 Computer Science Pub Date : 2018-11-30 DOI: 10.1145/3296957.3177154
Kaisheng Ma, Xueqing Li, M. Kandemir, J. Sampson, V. Narayanan, Jinyang Li, Tongda Wu, Zhibo Wang, Yongpan Liu, Yuan Xie
Nonvolatile processors have emerged as one of the promising solutions for energy harvesting scenarios, among which Wireless Sensor Networks (WSN) provide some of the most important applications. In a typical distributed sensing system, due to difference in location, energy harvester angles, power sources, etc. different nodes may have different amount of energy ready for use. While prior approaches have examined these challenges, they have not done so in the context of the features offered by nonvolatile computing approaches, which disrupt certain foundational assumptions. We propose a new set of nonvolatility-exploiting optimizations and embody them in the NEOFog system architecture. We discuss shifts in the tradeoffs in data and program distribution for nonvolatile processing-based WSNs, showing how non-volatile processing and non-volatile RF support alter the benefits of computation and communication-centric approaches. We also propose a new algorithm specific to nonvolatile sensing systems for load balancing both computation and communication demands. Collectively, the NV-aware optimizations in NEOFog increase the ability to perform in-fog processing by 4.2X and can increase this to 8X if virtualized nodes are 3X multiplexed.
非易失性处理器已成为能量收集场景中最有前途的解决方案之一,其中无线传感器网络(WSN)提供了一些最重要的应用。在典型的分布式传感系统中,由于位置、能量采集器角度、电源等的不同,不同节点可供使用的能量可能不同。虽然以前的方法已经研究了这些挑战,但它们并没有在非易失性计算方法提供的特性的背景下进行研究,这破坏了某些基本假设。我们提出了一套新的非易失性优化,并将其体现在NEOFog系统架构中。我们讨论了基于非易失性处理的wsn在数据和程序分布方面的权衡变化,展示了非易失性处理和非易失性RF支持如何改变以计算和通信为中心的方法的好处。我们还提出了一种针对非易失性传感系统的新算法,以平衡计算和通信需求。总的来说,NEOFog中的nv感知优化将雾中处理的能力提高了4.2倍,如果虚拟化节点是3倍多路复用,则可以将其提高到8倍。
{"title":"NEOFog","authors":"Kaisheng Ma, Xueqing Li, M. Kandemir, J. Sampson, V. Narayanan, Jinyang Li, Tongda Wu, Zhibo Wang, Yongpan Liu, Yuan Xie","doi":"10.1145/3296957.3177154","DOIUrl":"https://doi.org/10.1145/3296957.3177154","url":null,"abstract":"Nonvolatile processors have emerged as one of the promising solutions for energy harvesting scenarios, among which Wireless Sensor Networks (WSN) provide some of the most important applications. In a typical distributed sensing system, due to difference in location, energy harvester angles, power sources, etc. different nodes may have different amount of energy ready for use. While prior approaches have examined these challenges, they have not done so in the context of the features offered by nonvolatile computing approaches, which disrupt certain foundational assumptions. We propose a new set of nonvolatility-exploiting optimizations and embody them in the NEOFog system architecture. We discuss shifts in the tradeoffs in data and program distribution for nonvolatile processing-based WSNs, showing how non-volatile processing and non-volatile RF support alter the benefits of computation and communication-centric approaches. We also propose a new algorithm specific to nonvolatile sensing systems for load balancing both computation and communication demands. Collectively, the NV-aware optimizations in NEOFog increase the ability to perform in-fog processing by 4.2X and can increase this to 8X if virtualized nodes are 3X multiplexed.","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"74 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87271036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
VAULT
Q1 Computer Science Pub Date : 2018-11-30 DOI: 10.1145/3296957.3177155
Meysam Taassori, Ali Shafiee, R. Balasubramonian
Intel's SGX offers state-of-the-art security features, including confidentiality, integrity, and authentication (CIA) when accessing sensitive pages in memory. Sensitive pages are placed in an Enclave Page Cache (EPC) within the physical memory before they can be accessed by the processor. To control the overheads imposed by CIA guarantees, the EPC operates with a limited capacity (currently 128 MB). Because of this limited EPC size, sensitive pages must be frequently swapped between EPC and non-EPC regions in memory. A page swap is expensive (about 40K cycles) because it requires an OS system call, page copying, updates to integrity trees and metadata, etc. Our analysis shows that the paging overhead can slow the system on average by 5×, and other studies have reported even higher slowdowns for memory-intensive workloads. The paging overhead can be reduced by growing the size of the EPC to match the size of physical memory, while allowing the EPC to also accommodate non-sensitive pages. However, at least two important problems must be addressed to enable this growth in EPC: (i) the depth of the integrity tree and its cacheability must be improved to keep memory bandwidth overheads in check, (ii) the space overheads of integrity verification (tree and MACs) must be reduced. We achieve both goals by introducing a variable arity unified tree (VAULT) organization that is more compact and has lower depth. We further reduce the space overheads with techniques that combine MAC sharing and compression. With simulations, we show that the combination of our techniques can address most inefficiencies in SGX memory access and improve overall performance by 3.7×, relative to an SGX baseline, while incurring a memory capacity over-head of only 4.7%.
Intel的SGX提供了最先进的安全特性,包括访问内存中的敏感页面时的机密性、完整性和身份验证(CIA)。在处理器访问敏感页面之前,它们被放置在物理内存中的Enclave Page Cache (EPC)中。为了控制中央情报局保证的开销,EPC的运行容量有限(目前为128 MB)。由于EPC大小有限,敏感页必须在内存中的EPC和非EPC区域之间频繁交换。页交换是昂贵的(大约40K周期),因为它需要操作系统调用、页面复制、更新完整性树和元数据等。我们的分析表明,分页开销会使系统平均降低5倍的速度,而其他研究报告显示,对于内存密集型工作负载,降低的速度甚至更高。通过增加EPC的大小以匹配物理内存的大小,同时允许EPC容纳非敏感页面,可以减少分页开销。然而,要实现EPC的这种增长,至少必须解决两个重要问题:(i)必须改进完整性树的深度及其可缓存性,以控制内存带宽开销;(ii)必须减少完整性验证(树和mac)的空间开销。我们通过引入更紧凑和更低深度的可变密度统一树(VAULT)组织来实现这两个目标。我们通过结合MAC共享和压缩的技术进一步减少了空间开销。通过模拟,我们表明,我们的技术组合可以解决SGX内存访问中的大多数低效率问题,并将总体性能提高3.7倍(相对于SGX基线),同时仅产生4.7%的内存容量开销。
{"title":"VAULT","authors":"Meysam Taassori, Ali Shafiee, R. Balasubramonian","doi":"10.1145/3296957.3177155","DOIUrl":"https://doi.org/10.1145/3296957.3177155","url":null,"abstract":"Intel's SGX offers state-of-the-art security features, including confidentiality, integrity, and authentication (CIA) when accessing sensitive pages in memory. Sensitive pages are placed in an Enclave Page Cache (EPC) within the physical memory before they can be accessed by the processor. To control the overheads imposed by CIA guarantees, the EPC operates with a limited capacity (currently 128 MB). Because of this limited EPC size, sensitive pages must be frequently swapped between EPC and non-EPC regions in memory. A page swap is expensive (about 40K cycles) because it requires an OS system call, page copying, updates to integrity trees and metadata, etc. Our analysis shows that the paging overhead can slow the system on average by 5×, and other studies have reported even higher slowdowns for memory-intensive workloads. The paging overhead can be reduced by growing the size of the EPC to match the size of physical memory, while allowing the EPC to also accommodate non-sensitive pages. However, at least two important problems must be addressed to enable this growth in EPC: (i) the depth of the integrity tree and its cacheability must be improved to keep memory bandwidth overheads in check, (ii) the space overheads of integrity verification (tree and MACs) must be reduced. We achieve both goals by introducing a variable arity unified tree (VAULT) organization that is more compact and has lower depth. We further reduce the space overheads with techniques that combine MAC sharing and compression. With simulations, we show that the combination of our techniques can address most inefficiencies in SGX memory access and improve overall performance by 3.7×, relative to an SGX baseline, while incurring a memory capacity over-head of only 4.7%.","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"120 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3296957.3177155","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72402961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
LATR LATR
Q1 Computer Science Pub Date : 2018-11-30 DOI: 10.1145/3296957.3173198
Mohan Kumar, Steffen Maass, Sanidhya Kashyap, J. Veselý, Zi Yan, Taesoo Kim, A. Bhattacharjee, T. Krishna
We propose LATR-lazy TLB coherence-a software-based TLB shootdown mechanism that can alleviate the overhead of the synchronous TLB shootdown mechanism in existing operating systems. By handling the TLB coherence in a lazy fashion, LATR can avoid expensive IPIs which are required for delivering a shootdown signal to remote cores, and the performance overhead of associated interrupt handlers. Therefore, virtual memory operations, such as free and page migration operations, can benefit significantly from LATR's mechanism. For example, LATR improves the latency of munmap() by 70.8% on a 2-socket machine, a widely used configuration in modern data centers. Real-world, performance-critical applications such as web servers can also benefit from LATR: without any application-level changes, LATR improves Apache by 59.9% compared to Linux, and by 37.9% compared to ABIS, a highly optimized, state-of-the-art TLB coherence technique.
我们提出了LATR-lazy TLB coherence——一种基于软件的TLB关机机制,可以减轻现有操作系统中同步TLB关机机制的开销。通过以惰性方式处理TLB一致性,LATR可以避免昂贵的ipi,这是向远程内核传递停机信号所需的,以及相关中断处理程序的性能开销。因此,虚拟内存操作,如释放和页面迁移操作,可以从LATR的机制中获得显著的好处。例如,LATR在2套接字的机器上将munmap()的延迟提高了70.8%,这是现代数据中心中广泛使用的配置。现实世界中,性能关键型应用程序(如web服务器)也可以从LATR中受益:在没有任何应用程序级更改的情况下,LATR比Linux提高了59.9%,比ABIS(一种高度优化的、最先进的TLB一致性技术)提高了37.9%。
{"title":"LATR","authors":"Mohan Kumar, Steffen Maass, Sanidhya Kashyap, J. Veselý, Zi Yan, Taesoo Kim, A. Bhattacharjee, T. Krishna","doi":"10.1145/3296957.3173198","DOIUrl":"https://doi.org/10.1145/3296957.3173198","url":null,"abstract":"We propose LATR-lazy TLB coherence-a software-based TLB shootdown mechanism that can alleviate the overhead of the synchronous TLB shootdown mechanism in existing operating systems. By handling the TLB coherence in a lazy fashion, LATR can avoid expensive IPIs which are required for delivering a shootdown signal to remote cores, and the performance overhead of associated interrupt handlers. Therefore, virtual memory operations, such as free and page migration operations, can benefit significantly from LATR's mechanism. For example, LATR improves the latency of munmap() by 70.8% on a 2-socket machine, a widely used configuration in modern data centers. Real-world, performance-critical applications such as web servers can also benefit from LATR: without any application-level changes, LATR improves Apache by 59.9% compared to Linux, and by 37.9% compared to ABIS, a highly optimized, state-of-the-art TLB coherence technique.","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88636317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
CALOREE CALOREE
Q1 Computer Science Pub Date : 2018-11-30 DOI: 10.1145/3296957.3173184
Nikita Mishra, Connor Imes, J. Lafferty, H. Hoffmann
Many modern computing systems must provide reliable latency with minimal energy. Two central challenges arise when allocating system resources to meet these conflicting goals: (1) complexity modern hardware exposes diverse resources with complicated interactions and (2) dynamics latency must be maintained despite unpredictable changes in operating environment or input. Machine learning accurately models the latency of complex, interacting resources, but does not address system dynamics; control theory adjusts to dynamic changes, but struggles with complex resource interaction. We therefore propose CALOREE, a resource manager that learns key control parameters to meet latency requirements with minimal energy in complex, dynamic en- vironments. CALOREE breaks resource allocation into two sub-tasks: learning how interacting resources affect speedup, and controlling speedup to meet latency requirements with minimal energy. CALOREE deines a general control system whose parameters are customized by a learning framework while maintaining control-theoretic formal guarantees that the latency goal will be met. We test CALOREE's ability to deliver reliable latency on heterogeneous ARM big.LITTLE architectures in both single and multi-application scenarios. Compared to the best prior learning and control solutions, CALOREE reduces deadline misses by 60% and energy consumption by 13%.
许多现代计算系统必须以最小的能量提供可靠的延迟。在分配系统资源以满足这些相互冲突的目标时,出现了两个主要挑战:(1)复杂性现代硬件暴露了具有复杂交互的各种资源;(2)尽管操作环境或输入发生了不可预测的变化,但必须保持动态延迟。机器学习准确地模拟了复杂的、相互作用的资源的延迟,但不解决系统动力学;控制理论能够适应动态变化,但难以应对复杂的资源交互作用。因此,我们提出一种资源管理器CALOREE,它可以学习关键控制参数,以满足复杂动态环境中最小能量的延迟要求。CALOREE将资源分配分解为两个子任务:学习交互资源如何影响加速,以及控制加速以最小的能量满足延迟要求。CALOREE定义了一种通用控制系统,其参数由学习框架自定义,同时保持控制理论形式保证延迟目标的满足。我们测试了CALOREE在异构ARM处理器上提供可靠延迟的能力。LITTLE架构适用于单一和多应用场景。与最佳的先验学习和控制解决方案相比,CALOREE将最后期限遗漏率降低了60%,能耗降低了13%。
{"title":"CALOREE","authors":"Nikita Mishra, Connor Imes, J. Lafferty, H. Hoffmann","doi":"10.1145/3296957.3173184","DOIUrl":"https://doi.org/10.1145/3296957.3173184","url":null,"abstract":"Many modern computing systems must provide reliable latency with minimal energy. Two central challenges arise when allocating system resources to meet these conflicting goals: (1) complexity modern hardware exposes diverse resources with complicated interactions and (2) dynamics latency must be maintained despite unpredictable changes in operating environment or input. Machine learning accurately models the latency of complex, interacting resources, but does not address system dynamics; control theory adjusts to dynamic changes, but struggles with complex resource interaction. We therefore propose CALOREE, a resource manager that learns key control parameters to meet latency requirements with minimal energy in complex, dynamic en- vironments. CALOREE breaks resource allocation into two sub-tasks: learning how interacting resources affect speedup, and controlling speedup to meet latency requirements with minimal energy. CALOREE deines a general control system whose parameters are customized by a learning framework while maintaining control-theoretic formal guarantees that the latency goal will be met. We test CALOREE's ability to deliver reliable latency on heterogeneous ARM big.LITTLE architectures in both single and multi-application scenarios. Compared to the best prior learning and control solutions, CALOREE reduces deadline misses by 60% and energy consumption by 13%.","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87123126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
SPECTR SPECTR
Q1 Computer Science Pub Date : 2018-11-30 DOI: 10.1145/3296957.3173199
A. Rahmani, Bryan Donyanavard, T. Mück, Kasra Moazzemi, A. Jantsch, O. Mutlu, N. Dutt
Resource management strategies for many-core systems need to enable sharing of resources such as power, processing cores, and memory bandwidth while coordinating the priority and significance of system- and application-level objectives at runtime in a scalable and robust manner. State-of-the-art approaches use heuristics or machine learning for resource management, but unfortunately lack formalism in providing robustness against unexpected corner cases. While recent efforts deploy classical control-theoretic approaches with some guarantees and formalism, they lack scalability and autonomy to meet changing runtime goals. We present SPECTR, a new resource management approach for many-core systems that leverages formal supervisory control theory (SCT) to combine the strengths of classical control theory with state-of-the-art heuristic approaches to efficiently meet changing runtime goals. SPECTR is a scalable and robust control architecture and a systematic design flow for hierarchical control of many-core systems. SPECTR leverages SCT techniques such as gain scheduling to allow autonomy for individual controllers. It facilitates automatic synthesis of the high-level supervisory controller and its property verification. We implement SPECTR on an Exynos platform containing ARM»s big.LITTLE-based heterogeneous multi-processor (HMP) and demonstrate that SPECTR»s use of SCT is key to managing multiple interacting resources (e.g., chip power and processing cores) in the presence of competing objectives (e.g., satisfying QoS vs. power capping). The principles of SPECTR are easily applicable to any resource type and objective as long as the management problem can be modeled using dynamical systems theory (e.g., difference equations), discrete-event dynamic systems, or fuzzy dynamics.
多核系统的资源管理策略需要支持资源共享,如电力、处理核心和内存带宽,同时在运行时以可扩展和健壮的方式协调系统级和应用程序级目标的优先级和重要性。最先进的方法使用启发式或机器学习进行资源管理,但不幸的是,在提供针对意外情况的鲁棒性方面缺乏形式化。虽然最近的努力部署了具有一些保证和形式化的经典控制理论方法,但它们缺乏可伸缩性和自主性,无法满足不断变化的运行时目标。我们提出了spectrr,一种针对多核心系统的新的资源管理方法,它利用正式监督控制理论(SCT)将经典控制理论的优势与最先进的启发式方法相结合,以有效地满足不断变化的运行时目标。spectrr是一种可扩展的鲁棒控制体系结构,是一种用于多核心系统分层控制的系统设计流程。spectrr利用SCT技术(如增益调度)来实现单个控制器的自主性。它便于高级监控控制器的自动合成及其性能验证。我们在Exynos平台上实现了specr,该平台包含ARM的big。基于little的异构多处理器(HMP),并证明在存在竞争目标(例如,满足QoS与功率上限)的情况下,spectrr使用SCT是管理多个交互资源(例如,芯片功率和处理核心)的关键。只要管理问题可以用动态系统理论(如差分方程)、离散事件动态系统或模糊动力学建模,spectrr的原理就很容易适用于任何资源类型和目标。
{"title":"SPECTR","authors":"A. Rahmani, Bryan Donyanavard, T. Mück, Kasra Moazzemi, A. Jantsch, O. Mutlu, N. Dutt","doi":"10.1145/3296957.3173199","DOIUrl":"https://doi.org/10.1145/3296957.3173199","url":null,"abstract":"Resource management strategies for many-core systems need to enable sharing of resources such as power, processing cores, and memory bandwidth while coordinating the priority and significance of system- and application-level objectives at runtime in a scalable and robust manner. State-of-the-art approaches use heuristics or machine learning for resource management, but unfortunately lack formalism in providing robustness against unexpected corner cases. While recent efforts deploy classical control-theoretic approaches with some guarantees and formalism, they lack scalability and autonomy to meet changing runtime goals. We present SPECTR, a new resource management approach for many-core systems that leverages formal supervisory control theory (SCT) to combine the strengths of classical control theory with state-of-the-art heuristic approaches to efficiently meet changing runtime goals. SPECTR is a scalable and robust control architecture and a systematic design flow for hierarchical control of many-core systems. SPECTR leverages SCT techniques such as gain scheduling to allow autonomy for individual controllers. It facilitates automatic synthesis of the high-level supervisory controller and its property verification. We implement SPECTR on an Exynos platform containing ARM»s big.LITTLE-based heterogeneous multi-processor (HMP) and demonstrate that SPECTR»s use of SCT is key to managing multiple interacting resources (e.g., chip power and processing cores) in the presence of competing objectives (e.g., satisfying QoS vs. power capping). The principles of SPECTR are easily applicable to any resource type and objective as long as the management problem can be modeled using dynamical systems theory (e.g., difference equations), discrete-event dynamic systems, or fuzzy dynamics.","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73740573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 169
Skyway 人行天桥
Q1 Computer Science Pub Date : 2018-11-30 DOI: 10.1145/3296957.3173200
Khanh Nguyen, Lu Fang, Christian Navasca, G. Xu, Brian Demsky, Shan Lu
Managed languages such as Java and Scala are prevalently used in development of large-scale distributed systems. Under the managed runtime, when performing data transfer across machines, a task frequently conducted in a Big Data system, the system needs to serialize a sea of objects into a byte sequence before sending them over the network. The remote node receiving the bytes then deserializes them back into objects. This process is both performance-inefficient and labor-intensive: (1) object serialization/deserialization makes heavy use of reflection, an expensive runtime operation and/or (2) serialization/deserialization functions need to be hand-written and are error-prone. This paper presents Skyway, a JVM-based technique that can directly connect managed heaps of different (local or remote) JVM processes. Under Skyway, objects in the source heap can be directly written into a remote heap without changing their formats. Skyway provides performance benefits to any JVM-based system by completely eliminating the need (1) of invoking serialization/deserialization functions, thus saving CPU time, and (2) of requiring developers to hand-write serialization functions.
托管语言(如Java和Scala)广泛用于大规模分布式系统的开发。在托管运行时下,当执行跨机器的数据传输(这是大数据系统中经常执行的任务)时,系统需要在通过网络发送之前将大量对象序列化成字节序列。接收字节的远程节点然后将它们反序列化回对象。这个过程的性能效率很低,而且劳动密集型:(1)对象序列化/反序列化大量使用反射,这是一个昂贵的运行时操作和/或(2)序列化/反序列化函数需要手工编写,而且容易出错。本文介绍了Skyway,这是一种基于JVM的技术,可以直接连接不同(本地或远程)JVM进程的托管堆。在Skyway中,源堆中的对象可以直接写入远程堆,而无需更改其格式。Skyway通过完全消除(1)调用序列化/反序列化函数(从而节省CPU时间)和(2)要求开发人员手工编写序列化函数,为任何基于jvm的系统提供了性能优势。
{"title":"Skyway","authors":"Khanh Nguyen, Lu Fang, Christian Navasca, G. Xu, Brian Demsky, Shan Lu","doi":"10.1145/3296957.3173200","DOIUrl":"https://doi.org/10.1145/3296957.3173200","url":null,"abstract":"Managed languages such as Java and Scala are prevalently used in development of large-scale distributed systems. Under the managed runtime, when performing data transfer across machines, a task frequently conducted in a Big Data system, the system needs to serialize a sea of objects into a byte sequence before sending them over the network. The remote node receiving the bytes then deserializes them back into objects. This process is both performance-inefficient and labor-intensive: (1) object serialization/deserialization makes heavy use of reflection, an expensive runtime operation and/or (2) serialization/deserialization functions need to be hand-written and are error-prone. This paper presents Skyway, a JVM-based technique that can directly connect managed heaps of different (local or remote) JVM processes. Under Skyway, objects in the source heap can be directly written into a remote heap without changing their formats. Skyway provides performance benefits to any JVM-based system by completely eliminating the need (1) of invoking serialization/deserialization functions, thus saving CPU time, and (2) of requiring developers to hand-write serialization functions.","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75872112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Proceedings of the 14th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE 2018, Williamsburg, VA, USA, March 25-25, 2018 第14届ACM SIGPLAN/SIGOPS虚拟执行环境国际会议论文集,VEE 2018,威廉斯堡,弗吉尼亚州,美国,2018年3月25日至25日
Q1 Computer Science Pub Date : 2018-11-30 DOI: 10.1145/3296975
{"title":"Proceedings of the 14th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE 2018, Williamsburg, VA, USA, March 25-25, 2018","authors":"","doi":"10.1145/3296975","DOIUrl":"https://doi.org/10.1145/3296975","url":null,"abstract":"","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89091618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018 第39届ACM SIGPLAN编程语言设计与实现会议论文集,PLDI 2018,费城,宾夕法尼亚州,美国,2018年6月18-22日
Q1 Computer Science Pub Date : 2018-11-30 DOI: 10.1145/3296979
{"title":"Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018","authors":"","doi":"10.1145/3296979","DOIUrl":"https://doi.org/10.1145/3296979","url":null,"abstract":"","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74589693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
ACM Sigplan Notices
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1