Proceedings of the Computing Frontiers Conference最新文献

英文中文

An Empirical Comparison of Stream Clustering Algorithms 流聚类算法的经验比较

Proceedings of the Computing Frontiers Conference

Pub Date : 2017-05-15 DOI: 10.1145/3075564.3078887

Matthias Carnein, Dennis Assenmacher, H. Trautmann

Analysing streaming data has received considerable attention over the recent years. A key research area in this field is stream clustering which aims to recognize patterns in a possibly unbounded data stream of varying speed and structure. Over the past decades a multitude of new stream clustering algorithms have been proposed. However, to the best of our knowledge, no rigorous analysis and comparison of the different approaches has been performed. Our paper fills this gap and provides extensive experiments for a total of ten popular algorithms. We utilize a number of standard data sets of both, real and synthetic data and identify key weaknesses and strengths of the existing algorithms.

分析流数据近年来受到了相当大的关注。该领域的一个关键研究领域是流聚类，它旨在识别速度和结构变化的可能无界的数据流中的模式。在过去的几十年里，人们提出了许多新的流聚类算法。然而，据我们所知，还没有对不同的方法进行严格的分析和比较。我们的论文填补了这一空白，并为总共十种流行的算法提供了广泛的实验。我们利用了大量的标准数据集，包括真实数据和合成数据，并确定了现有算法的主要弱点和优势。

引用次数: 35

High Performance Coordinate Descent Matrix Factorization for Recommender Systems 推荐系统的高性能坐标下降矩阵分解

Proceedings of the Computing Frontiers Conference

Pub Date : 2017-05-15 DOI: 10.1145/3075564.3077625

Xi Yang, Jianbin Fang, Jing Chen, Chengkun Wu, T. Tang, Kai Lu

Coordinate descent (CD) has been proved to be an effective technique for matrix factorization (MF) in recommender systems. To speed up factorizing performance, various methods of implementing parallel CDMF have been proposed to leverage modern multi-core CPUs and many-core GPUs. Existing implementations are limited in either speed or portability (constrained to certain platforms). In this paper, we present an efficient and portable CDMF solver for recommender systems. On the one hand, we diagnose the baseline implementation and observe that it lacks the awareness of the hierarchical thread organization on modern hardware and the data variance of the rating matrix. Thus, we apply the thread batching technique and the load balancing technique to achieve high performance. On the other hand, we implement the CDMF solver in OpenCL so that it can run on various platforms. Based on the architectural specifics, we customize code variants to efficiently map them to the underlying hardware. The experimental results show that our implementation performs 2x faster on dual-socket Intel Xeon CPUs and 22x faster on an NVIDIA K20c GPU than the baseline implementations. When taking the CDMF solver as a benchmark, we observe that it runs 2.4x faster on the GPU than on the CPUs, whereas it achieves competitive performance on Intel MIC against the CPUs.

在推荐系统中，坐标下降(CD)是一种有效的矩阵分解技术。为了提高因式分解的性能，人们提出了各种实现并行CDMF的方法，以利用现代多核cpu和多核gpu。现有的实现在速度或可移植性方面受到限制(仅限于某些平台)。本文提出了一种适用于推荐系统的高效、便携的CDMF求解器。一方面，我们对基线实现进行了诊断，发现它缺乏对现代硬件上的分层线程组织和评级矩阵的数据方差的认识。因此，我们采用线程批处理技术和负载平衡技术来实现高性能。另一方面，我们在OpenCL中实现了CDMF求解器，使其可以在各种平台上运行。基于架构细节，我们定制代码变体以有效地将它们映射到底层硬件。实验结果表明，我们的实现在双插槽Intel Xeon cpu上的速度比基线实现快2倍，在NVIDIA K20c GPU上的速度快22倍。当将CDMF求解器作为基准时，我们观察到它在GPU上的运行速度比在cpu上快2.4倍，而它在Intel MIC上的性能与cpu相比具有竞争力。

{"title":"High Performance Coordinate Descent Matrix Factorization for Recommender Systems","authors":"Xi Yang, Jianbin Fang, Jing Chen, Chengkun Wu, T. Tang, Kai Lu","doi":"10.1145/3075564.3077625","DOIUrl":"https://doi.org/10.1145/3075564.3077625","url":null,"abstract":"Coordinate descent (CD) has been proved to be an effective technique for matrix factorization (MF) in recommender systems. To speed up factorizing performance, various methods of implementing parallel CDMF have been proposed to leverage modern multi-core CPUs and many-core GPUs. Existing implementations are limited in either speed or portability (constrained to certain platforms). In this paper, we present an efficient and portable CDMF solver for recommender systems. On the one hand, we diagnose the baseline implementation and observe that it lacks the awareness of the hierarchical thread organization on modern hardware and the data variance of the rating matrix. Thus, we apply the thread batching technique and the load balancing technique to achieve high performance. On the other hand, we implement the CDMF solver in OpenCL so that it can run on various platforms. Based on the architectural specifics, we customize code variants to efficiently map them to the underlying hardware. The experimental results show that our implementation performs 2x faster on dual-socket Intel Xeon CPUs and 22x faster on an NVIDIA K20c GPU than the baseline implementations. When taking the CDMF solver as a benchmark, we observe that it runs 2.4x faster on the GPU than on the CPUs, whereas it achieves competitive performance on Intel MIC against the CPUs.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126167773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Using Personality Metrics to Improve Cache Interference Management in Multicore Processors 利用个性指标改进多核处理器缓存干扰管理

Proceedings of the Computing Frontiers Conference

Pub Date : 2017-05-15 DOI: 10.1145/3075564.3075591

Mwaffaq Otoom, A. Jaleel, P. Trancoso

The trend of increasing the number of cores in a processor will lead to certain challenges, among which the fact that more cores issue more memory requests and this in turn will increase the competition, or interference, for shared resources such as the Last-Level Cache (LLC). In this work we focus on the cache interference while executing Decision Support System queries, which is a common case for a Data Center scenario. We study the co-execution of different queries from the TPC-H benchmark using the PostgreSQL DBMS system on a multicore with up to 16 cores and different LLC configurations. In addition to the working set metric, to better understand the effects of co-execution, we develop two new "personality" metrics to classify the behavior of the queries in co-execution: social and sensitive metrics. These metrics can be used to manage the cache interference and thus improve the co-execution performance of the queries.

处理器中内核数量的增加趋势将带来一些挑战，其中的事实是，更多的内核发出更多的内存请求，这反过来又会增加共享资源(如最后一级缓存(LLC))的竞争或干扰。在这项工作中，我们主要关注在执行决策支持系统查询时的缓存干扰，这是数据中心场景的常见情况。我们使用PostgreSQL DBMS系统在多达16核的多核和不同的LLC配置上研究了TPC-H基准测试中不同查询的协同执行。除了工作集度量之外，为了更好地理解协同执行的影响，我们开发了两个新的“个性”度量来对协同执行中的查询行为进行分类:社交度量和敏感度量。这些指标可用于管理缓存干扰，从而提高查询的协同执行性能。

引用次数: 1

BONSEYES: Platform for Open Development of Systems of Artificial Intelligence: Invited paper BONSEYES:人工智能系统开放开发平台:特邀论文

Proceedings of the Computing Frontiers Conference

Pub Date : 2017-05-15 DOI: 10.1145/3075564.3076259

Tim Llewellynn, M. Fernández-Carrobles, O. Déniz-Suárez, Samuel Fricker, A. Storkey, Nuria Pazos, Gordana S. Velikic, Kirsten Leufgen, Rozenn Dahyot, Sebastian Koller, G. Goumas, P. Leitner, Ganesh S. Dasika, Lei Wang, K. Tutschku

The Bonseyes EU H2020 collaborative project aims to develop a platform consisting of a Data Marketplace, a Deep Learning Toolbox, and Developer Reference Platforms for organizations wanting to adopt Artificial Intelligence. The project will be focused on using artificial intelligence in low power Internet of Things (IoT) devices ("edge computing"), embedded computing systems, and data center servers ("cloud computing"). It will bring about orders of magnitude improvements in efficiency, performance, reliability, security, and productivity in the design and programming of systems of artificial intelligence that incorporate Smart Cyber-Physical Systems (CPS). In addition, it will solve a causality problem for organizations who lack access to Data and Models. Its open software architecture will facilitate adoption of the whole concept on a wider scale. To evaluate the effectiveness, technical feasibility, and to quantify the real-world improvements in efficiency, security, performance, effort and cost of adding AI to products and services using the Bonseyes platform, four complementary demonstrators will be built. Bonseyes platform capabilities are aimed at being aligned with the European FI-PPP activities and take advantage of its flagship project FIWARE. This paper provides a description of the project motivation, goals and preliminary work.

Bonseyes EU H2020合作项目旨在为希望采用人工智能的组织开发一个由数据市场、深度学习工具箱和开发人员参考平台组成的平台。该项目将侧重于在低功耗物联网(IoT)设备(“边缘计算”)、嵌入式计算系统和数据中心服务器(“云计算”)中使用人工智能。它将在集成智能网络物理系统(CPS)的人工智能系统的设计和编程方面带来效率、性能、可靠性、安全性和生产力的数量级改进。此外，它将为缺乏数据和模型访问的组织解决因果关系问题。其开放的软件架构将促进整个概念在更大范围内的采用。为了评估使用Bonseyes平台将人工智能添加到产品和服务中的有效性、技术可行性，并量化在效率、安全性、性能、工作量和成本方面的实际改进，将构建四个互补的演示。Bonseyes平台的功能旨在与欧洲FI-PPP活动保持一致，并利用其旗舰项目FIWARE。本文描述了项目的动机、目标和前期工作。

{"title":"BONSEYES: Platform for Open Development of Systems of Artificial Intelligence: Invited paper","authors":"Tim Llewellynn, M. Fernández-Carrobles, O. Déniz-Suárez, Samuel Fricker, A. Storkey, Nuria Pazos, Gordana S. Velikic, Kirsten Leufgen, Rozenn Dahyot, Sebastian Koller, G. Goumas, P. Leitner, Ganesh S. Dasika, Lei Wang, K. Tutschku","doi":"10.1145/3075564.3076259","DOIUrl":"https://doi.org/10.1145/3075564.3076259","url":null,"abstract":"The Bonseyes EU H2020 collaborative project aims to develop a platform consisting of a Data Marketplace, a Deep Learning Toolbox, and Developer Reference Platforms for organizations wanting to adopt Artificial Intelligence. The project will be focused on using artificial intelligence in low power Internet of Things (IoT) devices (\"edge computing\"), embedded computing systems, and data center servers (\"cloud computing\"). It will bring about orders of magnitude improvements in efficiency, performance, reliability, security, and productivity in the design and programming of systems of artificial intelligence that incorporate Smart Cyber-Physical Systems (CPS). In addition, it will solve a causality problem for organizations who lack access to Data and Models. Its open software architecture will facilitate adoption of the whole concept on a wider scale. To evaluate the effectiveness, technical feasibility, and to quantify the real-world improvements in efficiency, security, performance, effort and cost of adding AI to products and services using the Bonseyes platform, four complementary demonstrators will be built. Bonseyes platform capabilities are aimed at being aligned with the European FI-PPP activities and take advantage of its flagship project FIWARE. This paper provides a description of the project motivation, goals and preliminary work.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122341839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

An Enhanced Image Reconstruction Tool for Computed Tomography on GPUs 基于gpu的计算机断层扫描增强图像重建工具

Proceedings of the Computing Frontiers Conference

Pub Date : 2017-05-15 DOI: 10.1145/3075564.3078889

Xiaodong Yu, Hao Wang, Wu-chun Feng, H. Gong, Guohua Cao

The algebraic reconstruction technique (ART) is an iterative algorithm for CT (i.e., computed tomography) image reconstruction that delivers better image quality with less radiation dosage than the industry-standard filtered back projection (FBP). However, the high computational cost of ART requires researchers to turn to high-performance computing to accelerate the algorithm. Alas, existing approaches for ART suffer from inefficient design of compressed data structures and computational kernels on GPUs. Thus, this paper presents our enhanced CUDA-based CT image reconstruction tool based on the algebraic reconstruction technique (ART) or cuART. It delivers a compression and parallelization solution for ART-based image reconstruction on GPUs. We address the under-performing, but popular, GPU libraries, e.g., cuSPARSE, BRC, and CSR5, on the ART algorithm and propose a symmetry-based CSR format (SCSR) to further compress the CSR data structure and optimize data access for both SpMV and SpMV_T via a column-indices permutation. We also propose sorting-based and sorting-free blocking techniques to optimize the kernel computation by leveraging the sparsity patterns of the system matrix. The end result is that cuART can reduce the memory footprint significantly and enable practical CT datasets to fit into a single GPU. The experimental results on a NVIDIA Tesla K80 GPU illustrate that our approach can achieve up to 6.8x, 7.2x, and 5.4x speedups over counterparts that use cuSPARSE, BRC, and CSR5, respectively.

代数重建技术(ART)是一种用于CT(即计算机断层扫描)图像重建的迭代算法，与行业标准的滤波反投影(FBP)相比，它能以更少的辐射剂量提供更好的图像质量。然而，ART的高计算成本要求研究人员转向高性能计算来加速算法。唉，现有的ART方法受到gpu上压缩数据结构和计算内核设计效率低下的影响。因此，本文提出了基于代数重建技术(ART)或cuART的增强的基于cuda的CT图像重建工具。它为gpu上基于art的图像重建提供了压缩和并行化解决方案。我们在ART算法上解决了性能不佳但流行的GPU库，例如cuSPARSE, BRC和CSR5，并提出了一种基于对称的CSR格式(SCSR)，以进一步压缩CSR数据结构，并通过列索引排列优化SpMV和SpMV_T的数据访问。我们还提出了基于排序和无排序的阻塞技术，通过利用系统矩阵的稀疏模式来优化内核计算。最终结果是，cuART可以显著减少内存占用，并使实际CT数据集适合单个GPU。在NVIDIA Tesla K80 GPU上的实验结果表明，与使用cuSPARSE、BRC和CSR5的方法相比，我们的方法可以实现高达6.8倍、7.2倍和5.4倍的加速。

{"title":"An Enhanced Image Reconstruction Tool for Computed Tomography on GPUs","authors":"Xiaodong Yu, Hao Wang, Wu-chun Feng, H. Gong, Guohua Cao","doi":"10.1145/3075564.3078889","DOIUrl":"https://doi.org/10.1145/3075564.3078889","url":null,"abstract":"The algebraic reconstruction technique (ART) is an iterative algorithm for CT (i.e., computed tomography) image reconstruction that delivers better image quality with less radiation dosage than the industry-standard filtered back projection (FBP). However, the high computational cost of ART requires researchers to turn to high-performance computing to accelerate the algorithm. Alas, existing approaches for ART suffer from inefficient design of compressed data structures and computational kernels on GPUs. Thus, this paper presents our enhanced CUDA-based CT image reconstruction tool based on the algebraic reconstruction technique (ART) or cuART. It delivers a compression and parallelization solution for ART-based image reconstruction on GPUs. We address the under-performing, but popular, GPU libraries, e.g., cuSPARSE, BRC, and CSR5, on the ART algorithm and propose a symmetry-based CSR format (SCSR) to further compress the CSR data structure and optimize data access for both SpMV and SpMV_T via a column-indices permutation. We also propose sorting-based and sorting-free blocking techniques to optimize the kernel computation by leveraging the sparsity patterns of the system matrix. The end result is that cuART can reduce the memory footprint significantly and enable practical CT datasets to fit into a single GPU. The experimental results on a NVIDIA Tesla K80 GPU illustrate that our approach can achieve up to 6.8x, 7.2x, and 5.4x speedups over counterparts that use cuSPARSE, BRC, and CSR5, respectively.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124172706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

SGXKernel: A Library Operating System Optimized for Intel SGX SGXKernel:一个针对Intel SGX优化的库操作系统

Proceedings of the Computing Frontiers Conference

Pub Date : 2017-05-15 DOI: 10.1145/3075564.3075572

H. Tian, Yong Zhang, Chunxiao Xing, Shoumeng Yan

Intel Software Guard Extensions (SGX) is an emerging trusted hardware technology. SGX enables user-level code to allocate regions of trusted memory, called enclaves, where the confidentiality and integrity of code and data are guaranteed. While SGX offers strong security for applications, one limitation of SGX is the lack of system call support inside enclaves, which leads to a non-trivial, refactoring effort when protecting existing applications with SGX. To address this issue, previous works have ported existing library OSes to SGX. However, these library OSes are suboptimal in terms of security and performance since they are designed without taking into account the characteristics of SGX. In this paper, we revisit the library OS approach in a new setting---Intel SGX. We first quantitatively evaluate the performance impact of enclave transitions on SGX programs, identifying it as a performance bottleneck for any library OSes that aim to support system-intensive SGX applications. We then present the design and implementation of SGXKernel, an in-enclave library OS, with highlight on its switchless design, which obviates the needs for enclave transitions. This switchless design is achieved by incorporating two novel ideas: asynchronous cross-enclave communication and preemptible in-enclave multi-threading. We intensively evaluate the performance of SGXKernel on microbenchmarks and application benchmarks. The results show that SGXKernel significantly outperforms a state-of-the-art library OS that has been ported to SGX.

Intel Software Guard Extensions (SGX)是一种新兴的可信硬件技术。SGX允许用户级代码分配可信内存区域(称为enclave)，在这些区域中，代码和数据的机密性和完整性得到了保证。虽然SGX为应用程序提供了强大的安全性，但SGX的一个限制是在enclaves内部缺乏系统调用支持，这导致在使用SGX保护现有应用程序时需要进行重要的重构工作。为了解决这个问题，以前的工作已经将现有的库操作系统移植到SGX。然而，这些库操作系统在安全性和性能方面不是最优的，因为它们在设计时没有考虑SGX的特性。在本文中，我们在一个新的设置中重新审视库操作系统方法——Intel SGX。我们首先定量地评估了enclave转换对SGX程序的性能影响，将其确定为任何旨在支持系统密集型SGX应用程序的库操作系统的性能瓶颈。然后介绍SGXKernel的设计和实现，这是一个包内库操作系统，重点介绍了它的无开关设计，它避免了对包转换的需要。这种无开关设计是通过结合两个新颖的思想实现的:异步跨enclave通信和可抢占的enclave内多线程。我们集中评估了SGXKernel在微基准测试和应用程序基准测试上的性能。结果表明，SGXKernel的性能明显优于移植到SGX的最先进的库操作系统。

{"title":"SGXKernel: A Library Operating System Optimized for Intel SGX","authors":"H. Tian, Yong Zhang, Chunxiao Xing, Shoumeng Yan","doi":"10.1145/3075564.3075572","DOIUrl":"https://doi.org/10.1145/3075564.3075572","url":null,"abstract":"Intel Software Guard Extensions (SGX) is an emerging trusted hardware technology. SGX enables user-level code to allocate regions of trusted memory, called enclaves, where the confidentiality and integrity of code and data are guaranteed. While SGX offers strong security for applications, one limitation of SGX is the lack of system call support inside enclaves, which leads to a non-trivial, refactoring effort when protecting existing applications with SGX. To address this issue, previous works have ported existing library OSes to SGX. However, these library OSes are suboptimal in terms of security and performance since they are designed without taking into account the characteristics of SGX. In this paper, we revisit the library OS approach in a new setting---Intel SGX. We first quantitatively evaluate the performance impact of enclave transitions on SGX programs, identifying it as a performance bottleneck for any library OSes that aim to support system-intensive SGX applications. We then present the design and implementation of SGXKernel, an in-enclave library OS, with highlight on its switchless design, which obviates the needs for enclave transitions. This switchless design is achieved by incorporating two novel ideas: asynchronous cross-enclave communication and preemptible in-enclave multi-threading. We intensively evaluate the performance of SGXKernel on microbenchmarks and application benchmarks. The results show that SGXKernel significantly outperforms a state-of-the-art library OS that has been ported to SGX.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117311118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Analytical Performance Modeling and Validation of Intel's Xeon Phi Architecture 英特尔至强Phi协处理器架构的分析性能建模与验证

Proceedings of the Computing Frontiers Conference

Pub Date : 2017-05-15 DOI: 10.1145/3075564.3075593

Sudheer Chunduri, Prasanna Balaprakash, V. Morozov, V. Vishwanath, Kalyan Kumaran

Modeling the performance of scientific applications on emerging hardware plays a central role in achieving extreme-scale computing goals. Analytical models that capture the interaction between applications and hardware characteristics are attractive because even a reasonably accurate model can be useful for performance tuning before the hardware is made available. In this paper, we develop a hardware model for Intel's second-generation Xeon Phi architecture code-named Knights Landing (KNL) for the SKOPE framework. We validate the KNL hardware model by projecting the performance of minibenchmarks and application kernels. The results show that our KNL model can project the performance with prediction errors of 10% to 20%. The hardware model also provides informative recommendations for code transformations and tuning.

在新兴硬件上对科学应用程序的性能进行建模在实现极端规模计算目标方面起着核心作用。捕获应用程序和硬件特征之间交互的分析模型很有吸引力，因为即使是相当精确的模型也可以在硬件可用之前用于性能调优。在本文中，我们为SKOPE框架开发了英特尔第二代Xeon Phi架构的硬件模型，代号为Knights Landing (KNL)。我们通过预测迷你基准测试和应用程序内核的性能来验证KNL硬件模型。结果表明，我们的KNL模型可以在10% ~ 20%的预测误差范围内预测性能。硬件模型还为代码转换和调优提供了信息丰富的建议。

引用次数: 5

GPU-UniCache: Automatic Code Generation of Spatial Blocking for Stencils on GPUs GPU-UniCache: gpu上空间块模板的自动代码生成

Proceedings of the Computing Frontiers Conference

Pub Date : 2017-05-15 DOI: 10.1145/3075564.3075583

Kaixi Hou, Hao Wang, Wu-chun Feng

Spatial blocking is a critical memory-access optimization to efficiently exploit the computing resources of parallel processors, such as many-core GPUs. By reusing cache-loaded data over multiple spatial iterations, spatial blocking can significantly lessen the pressure of accessing slow global memory. Stencil computations, for example, can exploit such data reuse via spatial blocking through the memory hierarchy of the GPU to improve performance. However, approaches to take advantage of such blocking require complex and tedious changes to the GPU kernels for different stencils, GPU architectures, and multi-level cached systems. In this work, we explore the challenges of different spatial blocking strategies over three cache levels of the GPU (i.e., L1 cache, scratchpad memory, and registers) and propose a framework GPU-UniCache to automatically generate codes to access buffered data in the cached systems of GPUs. Based on the characteristics of spatial blocking over various stencil kernels, we generalize the patterns of data communication, index conversion, and synchronization (with abstracted ISA-friendly interfaces) and map them to different architectures with highly optimized code variants. Our approach greatly simplifies the design of efficient and portable stencil computations across GPUs. Compared to stencil kernels based on hardware-managed memory (L1 cache) and other state-of-the-art GPU benchmarks, the GPU-UniCache can achieve significant improvements.

空间阻塞是有效利用多核gpu等并行处理器计算资源的关键内存访问优化方法。通过在多个空间迭代中重用缓存加载的数据，空间阻塞可以显著减轻访问慢速全局内存的压力。例如，模板计算可以通过GPU的内存层次结构通过空间阻塞来利用这种数据重用来提高性能。然而，利用这种阻塞的方法需要针对不同的模板、GPU架构和多级缓存系统对GPU内核进行复杂而繁琐的更改。在这项工作中，我们探讨了GPU的三个缓存级别(即L1缓存，刮本存储器和寄存器)上不同空间阻塞策略的挑战，并提出了一个框架GPU- unicache来自动生成代码以访问GPU缓存系统中的缓冲数据。基于空间阻塞在各种模板内核上的特点，我们概括了数据通信、索引转换和同步模式(使用抽象的isa友好接口)，并将它们映射到具有高度优化的代码变体的不同架构中。我们的方法大大简化了跨gpu的高效和便携式模板计算的设计。与基于硬件管理内存(L1缓存)和其他最先进的GPU基准的模板内核相比，GPU- unicache可以实现显著的改进。

{"title":"GPU-UniCache: Automatic Code Generation of Spatial Blocking for Stencils on GPUs","authors":"Kaixi Hou, Hao Wang, Wu-chun Feng","doi":"10.1145/3075564.3075583","DOIUrl":"https://doi.org/10.1145/3075564.3075583","url":null,"abstract":"Spatial blocking is a critical memory-access optimization to efficiently exploit the computing resources of parallel processors, such as many-core GPUs. By reusing cache-loaded data over multiple spatial iterations, spatial blocking can significantly lessen the pressure of accessing slow global memory. Stencil computations, for example, can exploit such data reuse via spatial blocking through the memory hierarchy of the GPU to improve performance. However, approaches to take advantage of such blocking require complex and tedious changes to the GPU kernels for different stencils, GPU architectures, and multi-level cached systems. In this work, we explore the challenges of different spatial blocking strategies over three cache levels of the GPU (i.e., L1 cache, scratchpad memory, and registers) and propose a framework GPU-UniCache to automatically generate codes to access buffered data in the cached systems of GPUs. Based on the characteristics of spatial blocking over various stencil kernels, we generalize the patterns of data communication, index conversion, and synchronization (with abstracted ISA-friendly interfaces) and map them to different architectures with highly optimized code variants. Our approach greatly simplifies the design of efficient and portable stencil computations across GPUs. Compared to stencil kernels based on hardware-managed memory (L1 cache) and other state-of-the-art GPU benchmarks, the GPU-UniCache can achieve significant improvements.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126598317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Social Engineering 2.0: A Foundational Work: Invited Paper 社会工程2.0:一项基础工作:邀请论文

Proceedings of the Computing Frontiers Conference

Pub Date : 2017-05-15 DOI: 10.1145/3075564.3076260

Davide Ariu, E. Frumento, G. Fumera

During the past few years, social engineering has rapidly evolved and has become a mainstream technique in cybercrime and terrorism. It is used especially in targeted attacks involving complex human and technological exploits, aimed at deceiving humans and IT systems. Building on the work carried out in the DOGANA project, funded by the European Union, this paper provides an overview of the evolution and of the current landscape of social engineering, and introduces as its main contribution a theoretical model of how human exploits are built, named the Victim Communication Stack.

在过去的几年中，社会工程迅速发展，并已成为网络犯罪和恐怖主义的主流技术。它特别用于涉及复杂的人力和技术漏洞的目标攻击，旨在欺骗人类和It系统。本文以欧盟资助的DOGANA项目所开展的工作为基础，概述了社会工程的发展和现状，并介绍了一个名为“受害者通信堆栈”的人类攻击如何构建的理论模型，作为其主要贡献。

引用次数: 9

Designing Swarms of Cyber-Physical Systems: the H2020 CPSwarm Project: Invited Paper 设计网络物理系统群:H2020 CPSwarm项目:特邀论文

Proceedings of the Computing Frontiers Conference

Pub Date : 2017-05-15 DOI: 10.1145/3075564.3077628

A. Bagnato, Regina Krisztina Bíró, Dario Bonino, C. Pastrone, W. Elmenreich, René Reiners, M. Schranz, Edin Arnautovic

Cyber-Physical Systems (CPS) find applications in a number of large-scale, safety-critical domains e.g. transportation, smart cities, etc. As a matter of fact, the increasing interactions amongst different CPS are starting to generate unpredictable behaviors and emerging properties, often leading to unforeseen and/or undesired results. Rather than being an unwanted byproduct, these interactions could, however, become an advantage if they were explicitly managed, and accounted, since the early design stages. The CPSwarm project, presented in this paper, aims at tackling these kinds of challenges by easing development and integration of complex herds of heterogeneous CPS. Thanks to CPSwarm, systems designed through a combination of existing and emerging tools, will collaborate on the basis of local policies and exhibit a collective behavior capable of solving complex, real-world, problems. Three real-world use cases will demonstrate the validity of foundational assumptions of the presented approach as well as the viability of the developed tools and methodologies.

网络物理系统(CPS)在许多大规模、安全关键领域得到应用，例如交通、智慧城市等。事实上，不同CPS之间越来越多的交互开始产生不可预测的行为和新属性，经常导致不可预见和/或不希望的结果。然而，如果在设计的早期阶段就对这些交互进行明确的管理和计算，那么这些交互就不会成为不必要的副产品，而会成为一种优势。本文中介绍的CPSwarm项目旨在通过简化异构CPS的复杂群的开发和集成来解决这些挑战。由于CPSwarm，通过结合现有和新兴工具设计的系统将在当地政策的基础上进行协作，并展示出能够解决复杂的、现实世界的问题的集体行为。三个真实世界的用例将演示所提出的方法的基本假设的有效性，以及所开发的工具和方法的可行性。

{"title":"Designing Swarms of Cyber-Physical Systems: the H2020 CPSwarm Project: Invited Paper","authors":"A. Bagnato, Regina Krisztina Bíró, Dario Bonino, C. Pastrone, W. Elmenreich, René Reiners, M. Schranz, Edin Arnautovic","doi":"10.1145/3075564.3077628","DOIUrl":"https://doi.org/10.1145/3075564.3077628","url":null,"abstract":"Cyber-Physical Systems (CPS) find applications in a number of large-scale, safety-critical domains e.g. transportation, smart cities, etc. As a matter of fact, the increasing interactions amongst different CPS are starting to generate unpredictable behaviors and emerging properties, often leading to unforeseen and/or undesired results. Rather than being an unwanted byproduct, these interactions could, however, become an advantage if they were explicitly managed, and accounted, since the early design stages. The CPSwarm project, presented in this paper, aims at tackling these kinds of challenges by easing development and integration of complex herds of heterogeneous CPS. Thanks to CPSwarm, systems designed through a combination of existing and emerging tools, will collaborate on the basis of local policies and exhibit a collective behavior capable of solving complex, real-world, problems. Three real-world use cases will demonstrate the validity of foundational assumptions of the presented approach as well as the viability of the developed tools and methodologies.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133915182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the Computing Frontiers Conference

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀