2020 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC)最新文献

英文中文

Cache Oblivious Strategies to Exploit Multi-Level Memory on Manycore Systems 利用多核系统多级内存的缓参无关策略

2020 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC)

Pub Date : 2020-11-01 DOI: 10.1109/MCHPC51950.2020.00011

Neil A. Butcher, Stephen L. Olivier, P. Kogge

Many-core systems are beginning to feature novel large, high-bandwidth intermediate memory as a visible part of the memory hierarchy. This paper discusses how to make use of intermediate memory when composing matrix multiply with transpose to compute $A$ * AT. We re-purpose the cache-oblivious approach developed by Frigo et al. and apply it to the composition of a bandwidth-bound kernel (transpose) with a compute-bound kernel (matrix multiply). Particular focus is on regions of matrix shapes far from square that are not usually considered. Our codes are simpler than optimized codes, but reasonably close in performance. Also, perhaps of more importance is developing a paradigm for how to construct other codes using intermediate memories.

许多核心系统开始以新的大型、高带宽中间内存为特征，将其作为内存层次结构的可见部分。本文讨论了在组合矩阵乘与转置时如何利用中间存储器来计算$A$ * AT。我们重新利用了Frigo等人开发的缓存无关方法，并将其应用于带宽绑定内核(转置)与计算绑定内核(矩阵乘法)的组合。特别关注的是远离正方形的矩阵形状区域，这些区域通常不被考虑。我们的代码比优化后的代码更简单，但性能相当接近。此外，也许更重要的是开发一个如何使用中间存储器构建其他代码的范例。

引用次数: 0

Understanding the Impact of Memory Access Patterns in Intel Processors 理解英特尔处理器中内存访问模式的影响

2020 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC)

Pub Date : 2020-11-01 DOI: 10.1109/MCHPC51950.2020.00012

Mohammad Alaul Haque Monil, Seyong Lee, J. Vetter, A. Malony

Because of increasing complexity in the memory hierarchy, predicting the performance of a given application in a given processor is becoming more difficult. The problem is worsened by the fact that the hardware needed to deal with more complex memory traffic also affects energy consumption. Moreover, in a heterogeneous system with shared main memory, the memory traffic between the last level cache (LLC) and the memory creates contention between other processors and accelerator devices. For these reasons, it is important to investigate and understand the impact of different memory access patterns on the memory system. This study investigates the interplay between Intel processors' memory hierarchy and different memory access patterns in applications. The authors explore sequential streaming and strided memory access patterns with the objective of predicting LLC-dynamic random access memory (DRAM) traffic for a given application in given Intel architectures. Moreover, the impact of prefetching is also investigated in this study. Experiments with different Intel micro-architectures uncover mechanisms to predict LLC-DRAM traffic that can yield up to 99% accuracy for sequential streaming access patterns and up to 95% accuracy for strided access patterns.

由于内存层次结构的复杂性日益增加，预测给定处理器中给定应用程序的性能变得越来越困难。处理更复杂的内存流量所需的硬件也会影响能耗，这一事实使问题更加恶化。此外，在具有共享主存的异构系统中，最后一级缓存(LLC)和内存之间的内存流量会在其他处理器和加速器设备之间产生争用。由于这些原因，研究和理解不同的内存访问模式对内存系统的影响是很重要的。本研究探讨Intel处理器的记忆体层级与不同记忆体存取模式之间的相互作用。作者探索了顺序流和跨行存储器访问模式，目的是预测给定英特尔架构中给定应用程序的llc动态随机存取存储器(DRAM)流量。此外，本研究还探讨了预取的影响。不同英特尔微架构的实验揭示了预测LLC-DRAM流量的机制，对于顺序流访问模式可以产生高达99%的准确率，对于跨步访问模式可以产生高达95%的准确率。

{"title":"Understanding the Impact of Memory Access Patterns in Intel Processors","authors":"Mohammad Alaul Haque Monil, Seyong Lee, J. Vetter, A. Malony","doi":"10.1109/MCHPC51950.2020.00012","DOIUrl":"https://doi.org/10.1109/MCHPC51950.2020.00012","url":null,"abstract":"Because of increasing complexity in the memory hierarchy, predicting the performance of a given application in a given processor is becoming more difficult. The problem is worsened by the fact that the hardware needed to deal with more complex memory traffic also affects energy consumption. Moreover, in a heterogeneous system with shared main memory, the memory traffic between the last level cache (LLC) and the memory creates contention between other processors and accelerator devices. For these reasons, it is important to investigate and understand the impact of different memory access patterns on the memory system. This study investigates the interplay between Intel processors' memory hierarchy and different memory access patterns in applications. The authors explore sequential streaming and strided memory access patterns with the objective of predicting LLC-dynamic random access memory (DRAM) traffic for a given application in given Intel architectures. Moreover, the impact of prefetching is also investigated in this study. Experiments with different Intel micro-architectures uncover mechanisms to predict LLC-DRAM traffic that can yield up to 99% accuracy for sequential streaming access patterns and up to 95% accuracy for strided access patterns.","PeriodicalId":318919,"journal":{"name":"2020 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC)","volume":"213 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132128609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Title Page i 第1页

2020 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC)

Pub Date : 2020-11-01 DOI: 10.1109/mchpc51950.2020.00001

引用次数: 0

Performance Potential of Mixed Data Management Modes for Heterogeneous Memory Systems 异构存储系统混合数据管理模式的性能潜力

2020 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC)

Pub Date : 2020-11-01 DOI: 10.1109/MCHPC51950.2020.00007

T. Effler, Michael R. Jantz, T. Jones

Many high-performance systems now include different types of memory devices within the same compute platform to meet strict performance and cost constraints. Such heterogeneous memory systems often include an upper-level tier with better performance, but limited capacity, and lower-level tiers with higher capacity, but less bandwidth and longer latencies for reads and writes. To utilize the different memory layers efficiently, current systems rely on hardware-directed, memory -side caching or they provide facilities in the operating system (OS) that allow applications to make their own data-tier assignments. Since these data management options each come with their own set of trade-offs, many systems also include mixed data management configurations that allow applications to employ hardware- and software-directed management simultaneously, but for different portions of their address space. Despite the opportunity to address limitations of stand-alone data management options, such mixed management modes are under-utilized in practice, and have not been evaluated in prior studies of complex memory hardware. In this work, we develop custom program profiling, configurations, and policies to study the potential of mixed data management modes to outperform hardware- or software-based management schemes alone. Our experiments, conducted on an Intel ® Knights Landing platform with high-bandwidth memory, demonstrate that the mixed data management mode achieves the same or better performance than the best stand-alone option for five memory intensive benchmark applications (run separately and in isolation), resulting in an average speedup compared to the best stand-alone policy of over 10 %, on average.

现在，许多高性能系统在同一计算平台中包含不同类型的内存设备，以满足严格的性能和成本限制。这种异构内存系统通常包括性能更好但容量有限的上层和容量更高但带宽更少且读写延迟更长的下层。为了有效地利用不同的内存层，当前的系统依赖于硬件导向的内存端缓存，或者它们在操作系统(OS)中提供了允许应用程序进行自己的数据层分配的功能。由于这些数据管理选项都有自己的一组权衡，因此许多系统还包括混合数据管理配置，允许应用程序同时采用硬件和软件导向的管理，但针对其地址空间的不同部分。尽管有机会解决独立数据管理选项的局限性，但这种混合管理模式在实践中未得到充分利用，并且在先前的复杂内存硬件研究中尚未进行评估。在这项工作中，我们开发了自定义程序分析、配置和策略，以研究混合数据管理模式的潜力，以优于单独基于硬件或软件的管理方案。我们在具有高带宽内存的英特尔®Knights Landing平台上进行的实验表明，混合数据管理模式在五个内存密集型基准应用程序(单独和隔离运行)中实现了与最佳独立选项相同或更好的性能，与最佳独立策略相比，平均速度提高了10%以上。

{"title":"Performance Potential of Mixed Data Management Modes for Heterogeneous Memory Systems","authors":"T. Effler, Michael R. Jantz, T. Jones","doi":"10.1109/MCHPC51950.2020.00007","DOIUrl":"https://doi.org/10.1109/MCHPC51950.2020.00007","url":null,"abstract":"Many high-performance systems now include different types of memory devices within the same compute platform to meet strict performance and cost constraints. Such heterogeneous memory systems often include an upper-level tier with better performance, but limited capacity, and lower-level tiers with higher capacity, but less bandwidth and longer latencies for reads and writes. To utilize the different memory layers efficiently, current systems rely on hardware-directed, memory -side caching or they provide facilities in the operating system (OS) that allow applications to make their own data-tier assignments. Since these data management options each come with their own set of trade-offs, many systems also include mixed data management configurations that allow applications to employ hardware- and software-directed management simultaneously, but for different portions of their address space. Despite the opportunity to address limitations of stand-alone data management options, such mixed management modes are under-utilized in practice, and have not been evaluated in prior studies of complex memory hardware. In this work, we develop custom program profiling, configurations, and policies to study the potential of mixed data management modes to outperform hardware- or software-based management schemes alone. Our experiments, conducted on an Intel ® Knights Landing platform with high-bandwidth memory, demonstrate that the mixed data management mode achieves the same or better performance than the best stand-alone option for five memory intensive benchmark applications (run separately and in isolation), resulting in an average speedup compared to the best stand-alone policy of over 10 %, on average.","PeriodicalId":318919,"journal":{"name":"2020 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115034020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Message from the Workshop Chairs 来自研讨会主席的信息

2020 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC)

Pub Date : 2020-11-01 DOI: 10.1109/mchpc51950.2020.00004

Nmic

The new computation technologies, such as big data analytics, modern machine learning technology, artificial intelligence (AI), blockchain, and security processing, have the great potential to be embedded into network to enable it to be intelligent and trustworthy. On the other hand, Information-Centric Networking (ICN), software-defined network (SDN), network function virtualization (NFV), Delay Tolerant Network (DTN), Vehicular Ad Hoc NETwork (VANET), network slicing, and data center network have emerged as the novel networking paradigms for fast and efficient delivering and retrieving data. Against this backdrop, there is a strong trend to move the computations from the cloud to not only the edges but also the resource-sufficient networking nodes, which triggers the convergence between the emerging networking concepts and the new computation technologies. The NMIC workshop 2019 brings together researchers to discuss the technical challenges and applications of the distributed computations for networking, the intelligent computations supported by the novel networking technologies, and the enforcement of series of computations. The accepted papers combine the computation technologies with ICN, SDN, DTN, VANET, and mobile network to enable them to be more intelligent, trustworthy, and efficient. We would like to especially thank the ICDCS 2019 organization team and all the TPC members of NMIC workshop 2019. Without their kind help, the NMIC workshop would not be possible.

新的计算技术，如大数据分析、现代机器学习技术、人工智能、区块链和安全处理，具有巨大的潜力，可以嵌入到网络中，使其变得智能和可信。另一方面，信息中心网络(ICN)、软件定义网络(SDN)、网络功能虚拟化(NFV)、容延迟网络(DTN)、车载自组织网络(VANET)、网络切片和数据中心网络已经成为快速有效地传输和检索数据的新型网络范式。在此背景下，将计算从云端转移到边缘和资源充足的网络节点是一种强烈的趋势，这引发了新兴网络概念和新的计算技术之间的融合。2019年NMIC研讨会汇集了研究人员，讨论了网络分布式计算的技术挑战和应用，新型网络技术支持的智能计算以及一系列计算的实施。论文将计算技术与ICN、SDN、DTN、VANET、移动网络相结合，使其更加智能、可信、高效。我们特别感谢ICDCS 2019组织团队和NMIC 2019研讨会的所有TPC成员。没有他们的帮助，NMIC研讨会是不可能的。

{"title":"Message from the Workshop Chairs","authors":"Nmic","doi":"10.1109/mchpc51950.2020.00004","DOIUrl":"https://doi.org/10.1109/mchpc51950.2020.00004","url":null,"abstract":"The new computation technologies, such as big data analytics, modern machine learning technology, artificial intelligence (AI), blockchain, and security processing, have the great potential to be embedded into network to enable it to be intelligent and trustworthy. On the other hand, Information-Centric Networking (ICN), software-defined network (SDN), network function virtualization (NFV), Delay Tolerant Network (DTN), Vehicular Ad Hoc NETwork (VANET), network slicing, and data center network have emerged as the novel networking paradigms for fast and efficient delivering and retrieving data. Against this backdrop, there is a strong trend to move the computations from the cloud to not only the edges but also the resource-sufficient networking nodes, which triggers the convergence between the emerging networking concepts and the new computation technologies. The NMIC workshop 2019 brings together researchers to discuss the technical challenges and applications of the distributed computations for networking, the intelligent computations supported by the novel networking technologies, and the enforcement of series of computations. The accepted papers combine the computation technologies with ICN, SDN, DTN, VANET, and mobile network to enable them to be more intelligent, trustworthy, and efficient. We would like to especially thank the ICDCS 2019 organization team and all the TPC members of NMIC workshop 2019. Without their kind help, the NMIC workshop would not be possible.","PeriodicalId":318919,"journal":{"name":"2020 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC)","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122434497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Leveraging a Heterogeneous Memory System for a Legacy Fortran Code: The Interplay of Storage Class Memory, DRAM and OS 利用异构内存系统的遗留Fortran代码:存储类内存，DRAM和操作系统的相互作用

2020 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC)

Pub Date : 2020-11-01 DOI: 10.1109/MCHPC51950.2020.00008

Steffen Christgau, T. Steinke

Large capacity Storage Class Memory (SCM) opens new possibilities for workloads requiring a large memory footprint. We examine optimization strategies for a legacy Fortran application on systems with an heterogeneous memory configuration comprising SCM and DRAM. We present a performance study for the multigrid solver component of the large-eddy simulation framework PALM for different memory configurations with large capacity SCM. An important optimization approach is the explicit assignment of storage locations depending on the data access characteristic to take advantage of the heterogeneous memory configuration. We are able to demonstrate that an explicit control over memory locations provides better performance compared to transparent hardware settings. As on aforementioned systems the page management by the OS appears as critical performance factor, we study the impact of different huge page settings.

大容量存储类内存(SCM)为需要大量内存占用的工作负载提供了新的可能性。我们研究了在包含SCM和DRAM的异构内存配置的系统上遗留Fortran应用程序的优化策略。本文研究了大涡模拟框架PALM的多网格求解器组件在不同存储器配置下的性能。一个重要的优化方法是根据数据访问特性显式分配存储位置，以利用异构内存配置。我们能够证明，与透明的硬件设置相比，对内存位置的显式控制提供了更好的性能。在上述系统中，操作系统的页面管理似乎是一个关键的性能因素，我们研究了不同的大页面设置的影响。

引用次数: 2

Architecting Heterogeneous Memory Systems with DRAM Technology Only: A Case Study on Relational Database 仅用DRAM技术构建异构存储系统:以关系数据库为例

2020 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC)

Pub Date : 2020-11-01 DOI: 10.1109/MCHPC51950.2020.00009

Yifan Qiao, Xubin Chen, Jingpeng Hao, Tong Zhang, C. Xie, Fei Wu

This paper advocates a DRAM-only design strategy to architect high-performance low-cost heterogeneous memory systems in future computing systems, and demonstrates its potential in the context of relational database. In particular, we envision a heterogeneous DRAM fabric consisting of convenient but expensive byte-addressable DRAM and large-capacity low-cost DRAM with coarse access granularity (e.g., 1K-byte). Regardless of specific memory technology, one can reduce the manufacturing cost by sacrificing the memory raw reliability, and apply error correction code (ECC) to restore the data storage integrity. The efficiency of ECC significantly improves as the codeword length increases, which enlarges the memory access granularity. This leads to a fundamental trade-off between memory cost and access granularity. Following this principle, Intel 3DXP-based Optane memory DIMM internally operates with a 256-byte ECC codeword length (hence 256-byte access granularity), and Hynix recently demonstrated low-cost DRAM DIMM with a 64-byte access granularity. This paper presents a design approach that enables relational database to take full advantage of the envisioned low-cost heterogeneous DRAM fabric to improve performance with only minimal database source code modification. Using MySQL as a test vehicle, we implemented a prototyping system, on which we have demonstrated its effectiveness under TPC-C and Sysbench OLTP benchmarks.

本文提出了在未来的计算系统中采用全内存的设计策略来构建高性能、低成本的异构存储系统，并论证了其在关系数据库环境中的潜力。特别是，我们设想了一种异构DRAM结构，由方便但昂贵的字节可寻址DRAM和具有粗访问粒度(例如，1k字节)的大容量低成本DRAM组成。无论采用何种内存技术，都可以通过牺牲内存原始可靠性来降低制造成本，并应用纠错码(error correction code, ECC)来恢复数据存储的完整性。随着码字长度的增加，ECC的效率显著提高，从而扩大了内存访问粒度。这导致了内存成本和访问粒度之间的基本权衡。遵循这一原则，英特尔基于3dxp的Optane内存DIMM内部运行256字节的ECC码字长度(因此256字节访问粒度)，Hynix最近展示了64字节访问粒度的低成本DRAM DIMM。本文提出了一种设计方法，该方法使关系数据库能够充分利用所设想的低成本异构DRAM结构，只需要对数据库源代码进行最小的修改即可提高性能。使用MySQL作为测试工具，我们实现了一个原型系统，在这个系统上我们已经证明了它在TPC-C和Sysbench OLTP基准下的有效性。

{"title":"Architecting Heterogeneous Memory Systems with DRAM Technology Only: A Case Study on Relational Database","authors":"Yifan Qiao, Xubin Chen, Jingpeng Hao, Tong Zhang, C. Xie, Fei Wu","doi":"10.1109/MCHPC51950.2020.00009","DOIUrl":"https://doi.org/10.1109/MCHPC51950.2020.00009","url":null,"abstract":"This paper advocates a DRAM-only design strategy to architect high-performance low-cost heterogeneous memory systems in future computing systems, and demonstrates its potential in the context of relational database. In particular, we envision a heterogeneous DRAM fabric consisting of convenient but expensive byte-addressable DRAM and large-capacity low-cost DRAM with coarse access granularity (e.g., 1K-byte). Regardless of specific memory technology, one can reduce the manufacturing cost by sacrificing the memory raw reliability, and apply error correction code (ECC) to restore the data storage integrity. The efficiency of ECC significantly improves as the codeword length increases, which enlarges the memory access granularity. This leads to a fundamental trade-off between memory cost and access granularity. Following this principle, Intel 3DXP-based Optane memory DIMM internally operates with a 256-byte ECC codeword length (hence 256-byte access granularity), and Hynix recently demonstrated low-cost DRAM DIMM with a 64-byte access granularity. This paper presents a design approach that enables relational database to take full advantage of the envisioned low-cost heterogeneous DRAM fabric to improve performance with only minimal database source code modification. Using MySQL as a test vehicle, we implemented a prototyping system, on which we have demonstrated its effectiveness under TPC-C and Sysbench OLTP benchmarks.","PeriodicalId":318919,"journal":{"name":"2020 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129297386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Persistent Memory Object Storage and Indexing for Scientific Computing 用于科学计算的持久内存对象存储和索引

2020 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC)

Pub Date : 2020-11-01 DOI: 10.1109/MCHPC51950.2020.00006

Awais Khan, Hyogi Sim, Sudharshan S. Vazhkudai, Jinseok Ma, Myeonghoon Oh, Youngjae Kim

This paper presents Mosiqs, a persistent memory object storage framework with metadata indexing and querying for scientific computing. We design Mosiqs based on the key idea that memory objects on shared PM pool can live beyond the application lifetime and can become the sharing currency for applications and scientists. Mosiqs provides an aggregate memory pool atop an array of persistent memory devices to store and access memory objects. Mosiqs uses a lightweight persistent memory key-value store to manage the metadata of memory objects such as persistent pointer mappings, which enables memory object sharing for effective scientific collaborations. Mosiqs is implemented atop PMDK. We evaluate the proposed approach on many-core server with an array of real PM devices. The preliminary evaluation confirms a 100% improvement for write and 30% in read performance against a PM-aware file system approach.

Mosiqs是一个面向科学计算的具有元数据索引和查询功能的持久内存对象存储框架。我们设计Mosiqs的关键思想是，共享PM池上的内存对象可以在应用程序生命周期之后继续存在，并且可以成为应用程序和科学家的共享货币。Mosiqs在一组持久内存设备上提供了一个聚合内存池，用于存储和访问内存对象。Mosiqs使用轻量级的持久内存键值存储来管理内存对象的元数据，例如持久指针映射，这使得内存对象共享能够有效地进行科学协作。Mosiqs在PMDK之上实现。我们用一组实际的PM设备在多核服务器上评估了所提出的方法。初步评估确认，与pm感知文件系统方法相比，写性能提高了100%，读性能提高了30%。

引用次数: 7

Hostile Cache Implications for Small, Dense Linear Solves 小的、密集的线性解的敌对缓存含义

2020 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC)

Pub Date : 2020-10-02 DOI: 10.1109/MCHPC51950.2020.00010

Tom Deakin, J. Cownie, Simon McIntosh-Smith, J. Lovegrove, R. Smedley-Stevenson

The full assembly of the stiffness matrix in finite element codes can be prohibitive in terms of memory footprint resulting from storing that enormous matrix. An optimisation and work around, particularly effective for discontinuous Galerkin based approaches, is to construct and solve the small dense linear systems locally within each element and avoid the global assembly entirely. The different independent linear systems can be solved concurrently in a batched manner, however we have found that the memory subsystem can show destructive behaviour in this paradigm, severely affecting the performance. In this paper we demonstrate the range of performance that can be obtained by allocating the local systems differently, along with evidence to attribute the reasons behind these differences.

在有限元代码中，刚度矩阵的完整组装可能会因为存储巨大的矩阵而占用内存而令人望而却步。对于基于Galerkin的不连续方法，一种特别有效的优化和解决方法是在每个元素内部局部构建和求解小型密集线性系统，从而完全避免全局装配。不同的独立线性系统可以同时以批处理的方式求解，但是我们发现存储子系统在这种范式中会表现出破坏性行为，严重影响性能。在本文中，我们展示了通过不同地分配本地系统可以获得的性能范围，以及归因于这些差异背后原因的证据。

引用次数: 0

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2020 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀