首页 > 最新文献

2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)最新文献

英文 中文
A Machine Learning Approach for Parameter Screening in Earthquake Simulation 地震模拟中参数筛选的机器学习方法
Marisol Monterrubio Velasco, J. C. Carrasco-Jiménez, Octavio Castillo Reyes, F. Cucchietti, J. Puente
Earthquakes are the result of rupture in the Earth's crust. The rupture process is difficult to model deterministically due to the number of unmeasurable parameters involved and poorly constrained physical conditions, as well as the very diverse scales involved in their nucleation (meters) and complete failure (up to hundreds of kilometers). In this research work we focus on synthetic seismic catalogs generated with a stochastic modeling technique called Fiber Bundle Model (FBM). These catalogs can be readily compared with statistical measures computed from real earthquake series, but the link between the FBM parameters and the characteristics of the obtained earthquake series is difficult to assess. Furthermore, the stochastic nature of the model requires a large amount of realizations in order to attain statistical robustness. The aim of this work is to estimate the FBM parameters that generate aftershock sequences that are similar to those generated by real seismic events. In order to estimate the optimal combination of parameters that generate such sequences, we executed a large number of simulations with different combinations of parameters using High-Performance Computing (HPC) resources to reduce compute time. Lastly, the synthetic datasets were used to train a supervised Machine Learning (ML) model that analyzes and extracts statistical patterns that reproduce the observations regarding aftershock occurrence and its spatio-temporal distribution in real seismic events.
地震是地壳破裂的结果。由于涉及的不可测量参数的数量和物理条件的约束不佳,以及它们的成核(米)和完全破坏(长达数百公里)所涉及的尺度非常不同,因此很难确定地模拟破裂过程。在这项研究工作中,我们的重点是合成地震目录生成的随机建模技术称为纤维束模型(FBM)。这些目录可以很容易地与从实际地震序列中计算出的统计度量进行比较,但是FBM参数与获得的地震序列特征之间的联系很难评估。此外,模型的随机性质需要大量的实现,以达到统计稳健性。这项工作的目的是估计产生与真实地震事件产生的余震序列相似的FBM参数。为了估计生成这些序列的参数的最佳组合,我们使用高性能计算(HPC)资源执行了大量具有不同参数组合的模拟,以减少计算时间。最后,使用合成数据集训练一个监督机器学习(ML)模型,该模型分析和提取再现真实地震事件中余震发生及其时空分布的统计模式。
{"title":"A Machine Learning Approach for Parameter Screening in Earthquake Simulation","authors":"Marisol Monterrubio Velasco, J. C. Carrasco-Jiménez, Octavio Castillo Reyes, F. Cucchietti, J. Puente","doi":"10.1109/CAHPC.2018.8645865","DOIUrl":"https://doi.org/10.1109/CAHPC.2018.8645865","url":null,"abstract":"Earthquakes are the result of rupture in the Earth's crust. The rupture process is difficult to model deterministically due to the number of unmeasurable parameters involved and poorly constrained physical conditions, as well as the very diverse scales involved in their nucleation (meters) and complete failure (up to hundreds of kilometers). In this research work we focus on synthetic seismic catalogs generated with a stochastic modeling technique called Fiber Bundle Model (FBM). These catalogs can be readily compared with statistical measures computed from real earthquake series, but the link between the FBM parameters and the characteristics of the obtained earthquake series is difficult to assess. Furthermore, the stochastic nature of the model requires a large amount of realizations in order to attain statistical robustness. The aim of this work is to estimate the FBM parameters that generate aftershock sequences that are similar to those generated by real seismic events. In order to estimate the optimal combination of parameters that generate such sequences, we executed a large number of simulations with different combinations of parameters using High-Performance Computing (HPC) resources to reduce compute time. Lastly, the synthetic datasets were used to train a supervised Machine Learning (ML) model that analyzes and extracts statistical patterns that reproduce the observations regarding aftershock occurrence and its spatio-temporal distribution in real seismic events.","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123583720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Online Detection of Spectre Attacks Using Microarchitectural Traces from Performance Counters 利用性能计数器的微架构跟踪在线检测幽灵攻击
Congmiao Li, J. Gaudiot
To improve processor performance, computer architects have adopted such acceleration techniques as speculative execution and caching. However, researchers have recently discovered that this approach implies inherent security flaws, as exploited by Meltdown and Spectre. Attacks targeting these vulnerabilities can leak protected data through side channels such as data cache timing by exploiting mis-speculated executions. The flaws can be catastrophic because they are fundamental and widespread and they affect many modern processors. Mitigating the effect of Meltdown is relatively straightforward in that it entails a software-based fix which has already been deployed by major OS vendors. However, to this day, there is no effective mitigation to Spectre. Fixing the problem may require a redesign of the architecture for conditional execution in future processors. In addition, a Spectre attack is hard to detect using traditional software-based antivirus techniques because it does not leave traces in traditional log files. In this paper, we proposed to monitor microarchitectural events such as cache misses, branch mispredictions from existing CPU performance counters to detect Spectre during attack runtime. Our detector was able to achieve 0% false negatives with less than 1 % false positives using various machine learning classifiers with a reasonable performance overhead.
为了提高处理器性能,计算机架构师采用了推测执行和缓存等加速技术。然而,研究人员最近发现,这种方法隐含着固有的安全漏洞,正如Meltdown和Spectre所利用的那样。针对这些漏洞的攻击可以利用错误推测的执行,通过数据缓存计时等侧通道泄露受保护的数据。这些缺陷可能是灾难性的,因为它们是基本的和广泛的,它们影响到许多现代处理器。减轻Meltdown的影响是相对简单的,因为它需要一个基于软件的修复,而这个修复已经被主要的操作系统供应商部署了。然而,直到今天,没有有效的缓解幽灵。解决这个问题可能需要在未来的处理器中重新设计条件执行的体系结构。此外,传统的基于软件的防病毒技术很难检测到Spectre攻击,因为它不会在传统的日志文件中留下痕迹。在本文中,我们提出监控微架构事件,如缓存丢失,从现有CPU性能计数器的分支错误预测,以检测在攻击运行时Spectre。我们的检测器使用各种机器学习分类器,在合理的性能开销下,能够实现0%的假阴性和小于1%的假阳性。
{"title":"Online Detection of Spectre Attacks Using Microarchitectural Traces from Performance Counters","authors":"Congmiao Li, J. Gaudiot","doi":"10.1109/CAHPC.2018.8645918","DOIUrl":"https://doi.org/10.1109/CAHPC.2018.8645918","url":null,"abstract":"To improve processor performance, computer architects have adopted such acceleration techniques as speculative execution and caching. However, researchers have recently discovered that this approach implies inherent security flaws, as exploited by Meltdown and Spectre. Attacks targeting these vulnerabilities can leak protected data through side channels such as data cache timing by exploiting mis-speculated executions. The flaws can be catastrophic because they are fundamental and widespread and they affect many modern processors. Mitigating the effect of Meltdown is relatively straightforward in that it entails a software-based fix which has already been deployed by major OS vendors. However, to this day, there is no effective mitigation to Spectre. Fixing the problem may require a redesign of the architecture for conditional execution in future processors. In addition, a Spectre attack is hard to detect using traditional software-based antivirus techniques because it does not leave traces in traditional log files. In this paper, we proposed to monitor microarchitectural events such as cache misses, branch mispredictions from existing CPU performance counters to detect Spectre during attack runtime. Our detector was able to achieve 0% false negatives with less than 1 % false positives using various machine learning classifiers with a reasonable performance overhead.","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126253205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
A Fault-Tolerant Agent-Based Architecture for Transient Servers in Fog Computing 雾计算中瞬态服务器基于agent的容错体系结构
J. P. A. Neto, D. Pianto, C. Ralha
Cloud datacenters are exploring their idle resources and offering virtual machine as transient servers without availability guarantees. Spot instances are transient servers offered by Amazon AWS, with rules that define prices according to supply and demand. These instances will run for as long as the current price is lower than the maximum bid price given by users. Spot instances have been increasingly used for executing computation and memory intensive applications. By using dynamic fault tolerant mechanisms and appropriate strategies, users can effectively use spot instances to run applications at a cheaper price. This paper presents a resilient multi-strategy agent-based cloud computing architecture. The architecture combines machine learning and a statistical model to predict instance survival times, refine fault tolerance parameters and reduce total execution time. We evaluate our strategies and the experiments demonstrate high levels of accuracy, reaching a 94% survival prediction success rate, which indicates that the model can be effectively used to define execution strategies to prevent failures at revocation events under realistic working conditions.
云数据中心正在探索其空闲资源,并提供虚拟机作为没有可用性保证的临时服务器。现货实例是亚马逊AWS提供的临时服务器,其规则根据供需定义价格。只要当前价格低于用户给出的最高出价,这些实例就会运行。Spot实例越来越多地用于执行计算和内存密集型应用程序。通过使用动态容错机制和适当的策略,用户可以有效地使用现货实例以较低的价格运行应用程序。提出了一种弹性多策略的基于agent的云计算架构。该架构结合了机器学习和统计模型来预测实例生存时间,改进容错参数并减少总执行时间。我们评估了我们的策略,实验证明了高水平的准确性,达到94%的生存预测成功率,这表明该模型可以有效地用于定义执行策略,以防止实际工作条件下撤销事件的失败。
{"title":"A Fault-Tolerant Agent-Based Architecture for Transient Servers in Fog Computing","authors":"J. P. A. Neto, D. Pianto, C. Ralha","doi":"10.1109/CAHPC.2018.8645859","DOIUrl":"https://doi.org/10.1109/CAHPC.2018.8645859","url":null,"abstract":"Cloud datacenters are exploring their idle resources and offering virtual machine as transient servers without availability guarantees. Spot instances are transient servers offered by Amazon AWS, with rules that define prices according to supply and demand. These instances will run for as long as the current price is lower than the maximum bid price given by users. Spot instances have been increasingly used for executing computation and memory intensive applications. By using dynamic fault tolerant mechanisms and appropriate strategies, users can effectively use spot instances to run applications at a cheaper price. This paper presents a resilient multi-strategy agent-based cloud computing architecture. The architecture combines machine learning and a statistical model to predict instance survival times, refine fault tolerance parameters and reduce total execution time. We evaluate our strategies and the experiments demonstrate high levels of accuracy, reaching a 94% survival prediction success rate, which indicates that the model can be effectively used to define execution strategies to prevent failures at revocation events under realistic working conditions.","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116600016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Variable-Size Batched Condition Number Calculation on GPUs gpu上的可变大小批处理条件数计算
H. Anzt, J. Dongarra, Goran Flegar, Thomas Grützmacher
We present a kernel that is designed to quickly compute the condition number of a large collection of tiny matrices on a graphics processing unit (GPU). The matrices can differ in size and the process integrates the use of pivoting to ensure a numerically-stable matrix inversion. The performance assessment reveals that, in double precision arithmetic, the new GPU kernel achieves up to 550 GFLOPs (billions of floating-point operations per second) and 800 GFLOPs on NVIDIA's P100 and V100 GPUs, respectively. The results also demonstrate a considerable speed-up with respect to a workflow that computes the condition number via launching a set of four batched kernels. In addition, we present a variable-size batched kernel for the computation of the matrix infinity norm. We show that this memory-bound kernel achieves up to 90% of the sustainable peak bandwidth.
我们提出了一个内核,旨在快速计算图形处理单元(GPU)上大量微小矩阵的条件数。矩阵的大小可以不同,并且该过程集成了旋转的使用,以确保数值稳定的矩阵反演。性能评估显示,在双精度运算中,新的GPU内核在NVIDIA的P100和V100 GPU上分别实现了高达550 GFLOPs(每秒数十亿次浮点运算)和800 GFLOPs。结果还表明,通过启动一组四个批处理内核来计算条件数的工作流有相当大的加速。此外,我们还提出了一种用于计算矩阵无穷范数的可变大小的批处理核。我们表明,这个内存受限的内核可以达到可持续峰值带宽的90%。
{"title":"Variable-Size Batched Condition Number Calculation on GPUs","authors":"H. Anzt, J. Dongarra, Goran Flegar, Thomas Grützmacher","doi":"10.1109/CAHPC.2018.8645907","DOIUrl":"https://doi.org/10.1109/CAHPC.2018.8645907","url":null,"abstract":"We present a kernel that is designed to quickly compute the condition number of a large collection of tiny matrices on a graphics processing unit (GPU). The matrices can differ in size and the process integrates the use of pivoting to ensure a numerically-stable matrix inversion. The performance assessment reveals that, in double precision arithmetic, the new GPU kernel achieves up to 550 GFLOPs (billions of floating-point operations per second) and 800 GFLOPs on NVIDIA's P100 and V100 GPUs, respectively. The results also demonstrate a considerable speed-up with respect to a workflow that computes the condition number via launching a set of four batched kernels. In addition, we present a variable-size batched kernel for the computation of the matrix infinity norm. We show that this memory-bound kernel achieves up to 90% of the sustainable peak bandwidth.","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"261 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122689289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Copyright 版权
{"title":"Copyright","authors":"","doi":"10.1109/cahpc.2018.8645922","DOIUrl":"https://doi.org/10.1109/cahpc.2018.8645922","url":null,"abstract":"","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115785940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multicore Performance Engineering of Sparse Triangular Solves Using a Modified Roofline Model 基于改进rooline模型的稀疏三角形解的多核性能工程
M. Wittmann, G. Hager, R. Janalík, M. Lanser, A. Klawonn, O. Rheinbach, O. Schenk, G. Wellein
The Roofline model is widely used to visualize the performance of executed code together with the upper performance bounds given by the memory bandwidth and the processor peak performance. The model can thus provide an insightful visualization of bottlenecks. In this paper, we try to establish realistic bandwidth ceilings for the sparse triangular solve step of PARDISO, a leading sparse direct solver package, which is also part of the Intel MKL library. The performance of the forward and backward substitution process is analyzed and benchmarked for a representative set of sparse matrices on seven modern x86-type multicore architectures and the Knights Landing manycore architecture. It is shown how to accurately measure the necessary quantities also for threaded code, and the measurement approach, its validation, as well as limitations are discussed. Our modeling approach covers the serial and parallel execution phases, allowing for in-socket performance predictions.
rooline模型被广泛用于可视化所执行代码的性能,以及由内存带宽和处理器峰值性能给出的性能上限。因此,该模型可以提供瓶颈的深刻可视化。在本文中,我们试图为PARDISO的稀疏三角形求解步骤建立现实的带宽上限,PARDISO是一个领先的稀疏直接求解器包,也是英特尔MKL库的一部分。在七种现代x86型多核架构和Knights Landing多核架构上,对一组具有代表性的稀疏矩阵进行了前向和后向替换过程的性能分析和基准测试。本文还展示了如何准确地测量线程代码所需的数量,并讨论了测量方法、验证方法以及局限性。我们的建模方法涵盖了串行和并行执行阶段,允许进行套接字内性能预测。
{"title":"Multicore Performance Engineering of Sparse Triangular Solves Using a Modified Roofline Model","authors":"M. Wittmann, G. Hager, R. Janalík, M. Lanser, A. Klawonn, O. Rheinbach, O. Schenk, G. Wellein","doi":"10.1109/CAHPC.2018.8645938","DOIUrl":"https://doi.org/10.1109/CAHPC.2018.8645938","url":null,"abstract":"The Roofline model is widely used to visualize the performance of executed code together with the upper performance bounds given by the memory bandwidth and the processor peak performance. The model can thus provide an insightful visualization of bottlenecks. In this paper, we try to establish realistic bandwidth ceilings for the sparse triangular solve step of PARDISO, a leading sparse direct solver package, which is also part of the Intel MKL library. The performance of the forward and backward substitution process is analyzed and benchmarked for a representative set of sparse matrices on seven modern x86-type multicore architectures and the Knights Landing manycore architecture. It is shown how to accurately measure the necessary quantities also for threaded code, and the measurement approach, its validation, as well as limitations are discussed. Our modeling approach covers the serial and parallel execution phases, allowing for in-socket performance predictions.","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126035664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Energy Efficient Parallel K-Means Clustering for an Intel® Hybrid Multi-Chip Package 英特尔®混合多芯片封装的高能效并行K-Means聚类
M. Souza, L. Maciel, Pedro Henrique Penna, H. Freitas
FPGA devices have been proving to be good candidates to accelerate applications from different research topics. For instance, machine learning applications such as K-Means clustering usually relies on large amount of data to be processed, and, despite the performance offered by other architectures, FPGAs can offer better energy efficiency. With that in mind, Intel has launched a platform that integrates a multicore and an FPGA in the same package, enabling low latency and coherent fine-grained data offload. In this paper, we present a parallel implementation of the K-Means clustering algorithm, for this novel platform, using OpenCL language, and compared it against other platforms. We found that the CPU+FPGA platform was more energy efficient than the CPU-only approach from 70.71% to 85.92%, with Standard and Tiny input sizes respectively, and up to 68.21% of performance improvement was obtained with Tiny input size. Furthermore, it was up to 7.2×more energy efficient than an Intel® Xeon Phi ™, 21.5×than a cluster of Raspberry Pi boards, and 3.8×than the low-power MPPA-256 architecture, when the Standard input size was used.
FPGA器件已被证明是加速不同研究课题应用的良好候选者。例如,像K-Means聚类这样的机器学习应用通常依赖于大量的数据来处理,而且,尽管其他架构提供的性能,fpga可以提供更好的能源效率。考虑到这一点,英特尔推出了一个将多核和FPGA集成在同一个封装中的平台,实现了低延迟和一致的细粒度数据卸载。在本文中,我们提出了K-Means聚类算法的并行实现,在这个新平台上,使用OpenCL语言,并与其他平台进行了比较。我们发现,CPU+FPGA平台在标准和微小输入尺寸下的能效分别高于CPU+FPGA平台的70.71%至85.92%,而在微小输入尺寸下的性能提升高达68.21%。此外,当使用标准输入尺寸时,它比Intel®Xeon Phi™,21.5×than一组树莓派板和3.8×than低功耗MPPA-256架构节能7.2×more。
{"title":"Energy Efficient Parallel K-Means Clustering for an Intel® Hybrid Multi-Chip Package","authors":"M. Souza, L. Maciel, Pedro Henrique Penna, H. Freitas","doi":"10.1109/CAHPC.2018.8645850","DOIUrl":"https://doi.org/10.1109/CAHPC.2018.8645850","url":null,"abstract":"FPGA devices have been proving to be good candidates to accelerate applications from different research topics. For instance, machine learning applications such as K-Means clustering usually relies on large amount of data to be processed, and, despite the performance offered by other architectures, FPGAs can offer better energy efficiency. With that in mind, Intel has launched a platform that integrates a multicore and an FPGA in the same package, enabling low latency and coherent fine-grained data offload. In this paper, we present a parallel implementation of the K-Means clustering algorithm, for this novel platform, using OpenCL language, and compared it against other platforms. We found that the CPU+FPGA platform was more energy efficient than the CPU-only approach from 70.71% to 85.92%, with Standard and Tiny input sizes respectively, and up to 68.21% of performance improvement was obtained with Tiny input size. Furthermore, it was up to 7.2×more energy efficient than an Intel® Xeon Phi ™, 21.5×than a cluster of Raspberry Pi boards, and 3.8×than the low-power MPPA-256 architecture, when the Standard input size was used.","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125216432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Serendipity: How Supercomputing Technology is Enabling a Revolution in Artificial Intelligence 《意外发现:超级计算技术如何推动人工智能革命
José Moreira
{"title":"Serendipity: How Supercomputing Technology is Enabling a Revolution in Artificial Intelligence","authors":"José Moreira","doi":"10.1109/cahpc.2018.8645849","DOIUrl":"https://doi.org/10.1109/cahpc.2018.8645849","url":null,"abstract":"","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132946896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Program Committees 程序委员会
{"title":"Program Committees","authors":"","doi":"10.1109/cahpc.2018.8645915","DOIUrl":"https://doi.org/10.1109/cahpc.2018.8645915","url":null,"abstract":"","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133133034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic Ray-Tracer Cloud Offloading in OPENMP 自动光线追踪云卸载在OPENMP
M. Mortatti, H. Yviquel, G. Araújo
Rendering an image from a 3D scene requires a large amount of computation which grows exponentially with the complexity of the scene (e.g. number of objects and light sources). With the increasing demand of high definition content, 3D designers need to use high-performance computer systems to keep the rendering time acceptable. Since owning computer clusters is expensive, designers usually rent computing power directly from cloud service providers (e.g, AWS and Azure). However, even though many cloud providers already propose dedicated rendering services, integrating them within the standard workflow of modeling softwares can become a complex and cumbersome task. It typically requires exporting the project from the design software, dealing with various access control mechanisms from different clouds to upload the project, and executing the rendering remotely through command-line. Offloading computation to the cloud is a technique which can considerably simplify such tasks. To achieve that, this paper uses an extension of openMP 4.X to eliminate any major interactions with the end-user, while minimizing the complexity of cloud integration and optimizing the design workflow. It applies such approach to a ray-tracing application, a simplified version of the engines used by professional 3D modeling software (e.g. Blender). It automatically offloads the rendering process from the user computer to computer cluster within the Microsoft Azure cloud, brings the resulting images back after the computation ends and displays them directly on the screen of the user computer, thus providing a transparent programming model and good speed-ups over local execution.
从3D场景中渲染图像需要大量的计算,随着场景的复杂性(例如物体和光源的数量)呈指数增长。随着对高清晰度内容的需求不断增加,3D设计师需要使用高性能的计算机系统来保持可接受的渲染时间。由于拥有计算机集群是昂贵的,设计人员通常直接从云服务提供商(例如AWS和Azure)租用计算能力。然而,尽管许多云提供商已经提出了专门的渲染服务,但将它们集成到建模软件的标准工作流中可能会成为一项复杂而繁琐的任务。它通常需要从设计软件导出项目,处理来自不同云的各种访问控制机制以上传项目,并通过命令行远程执行呈现。将计算卸载到云端是一种可以大大简化此类任务的技术。为了实现这一目标,本文使用了openmp4的扩展。X以消除与最终用户的任何主要交互,同时最大限度地降低云集成的复杂性并优化设计工作流。它将这种方法应用于光线追踪应用程序,这是专业3D建模软件(例如Blender)使用的引擎的简化版本。它自动将渲染过程从用户计算机卸载到Microsoft Azure云中的计算机集群,在计算结束后将生成的图像带回来并直接显示在用户计算机的屏幕上,从而提供透明的编程模型和优于本地执行的良好加速。
{"title":"Automatic Ray-Tracer Cloud Offloading in OPENMP","authors":"M. Mortatti, H. Yviquel, G. Araújo","doi":"10.1109/CAHPC.2018.8645871","DOIUrl":"https://doi.org/10.1109/CAHPC.2018.8645871","url":null,"abstract":"Rendering an image from a 3D scene requires a large amount of computation which grows exponentially with the complexity of the scene (e.g. number of objects and light sources). With the increasing demand of high definition content, 3D designers need to use high-performance computer systems to keep the rendering time acceptable. Since owning computer clusters is expensive, designers usually rent computing power directly from cloud service providers (e.g, AWS and Azure). However, even though many cloud providers already propose dedicated rendering services, integrating them within the standard workflow of modeling softwares can become a complex and cumbersome task. It typically requires exporting the project from the design software, dealing with various access control mechanisms from different clouds to upload the project, and executing the rendering remotely through command-line. Offloading computation to the cloud is a technique which can considerably simplify such tasks. To achieve that, this paper uses an extension of openMP 4.X to eliminate any major interactions with the end-user, while minimizing the complexity of cloud integration and optimizing the design workflow. It applies such approach to a ray-tracing application, a simplified version of the engines used by professional 3D modeling software (e.g. Blender). It automatically offloads the rendering process from the user computer to computer cluster within the Microsoft Azure cloud, brings the resulting images back after the computation ends and displays them directly on the screen of the user computer, thus providing a transparent programming model and good speed-ups over local execution.","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123782641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1