首页 > 最新文献

12th Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2004. Proceedings.最新文献

英文 中文
On the performance of the POSIX I/O interface to PVFS 关于PVFS的POSIX I/O接口的性能
M. Vilayannur, R. Ross, P. Carns, R. Thakur, A. Sivasubramaniam, M. Kandemir
The ever-increasing gap in performance between CPU/memory technologies and the I/O subsystem (disks, I/O buses) in modern workstations has exacerbated the I/O bottlenecks inherent in applications that access large disk resident data sets. A common technique to alleviate the I/O bottlenecks on clusters of workstations, is the use of parallel file systems. One such parallel file system is the parallel virtual file system (PVFS), which is a freely available tool to achieve high-performance I/O on Linux-based clusters. Here, we describe the performance and scalability of the UNIX I/O interface to PVFS. To illustrate the performance, we present experimental results using Bonnie++, a commonly used file system benchmark to test file system throughput; a synthetic parallel I/O application for calculating aggregate read and write bandwidths; and a synthetic benchmark which calculates the time taken to untar the Linux kernel source tree to measure performance of a large number of small file operations. We obtained aggregate read and write bandwidths as high as 550 MB/s with a Myrinet-based network and 160MB/s with fast Ethernet.
在现代工作站中,CPU/内存技术与I/O子系统(磁盘、I/O总线)之间的性能差距越来越大,这加剧了访问大型磁盘驻留数据集的应用程序固有的I/O瓶颈。缓解工作站集群I/O瓶颈的一种常用技术是使用并行文件系统。其中一种并行文件系统是并行虚拟文件系统(PVFS),它是一种在基于linux的集群上实现高性能I/O的免费工具。在这里,我们描述PVFS的UNIX I/O接口的性能和可伸缩性。为了说明性能,我们给出了使用Bonnie++(一个常用的文件系统基准测试工具)测试文件系统吞吐量的实验结果;用于计算聚合读写带宽的综合并行I/O应用程序;还有一个综合基准,计算解压缩Linux内核源代码树所需的时间,以衡量大量小文件操作的性能。我们在基于myrinet的网络中获得了高达550 MB/s的总读写带宽,在快速以太网中获得了高达160MB/s的总读写带宽。
{"title":"On the performance of the POSIX I/O interface to PVFS","authors":"M. Vilayannur, R. Ross, P. Carns, R. Thakur, A. Sivasubramaniam, M. Kandemir","doi":"10.1109/EMPDP.2004.1271463","DOIUrl":"https://doi.org/10.1109/EMPDP.2004.1271463","url":null,"abstract":"The ever-increasing gap in performance between CPU/memory technologies and the I/O subsystem (disks, I/O buses) in modern workstations has exacerbated the I/O bottlenecks inherent in applications that access large disk resident data sets. A common technique to alleviate the I/O bottlenecks on clusters of workstations, is the use of parallel file systems. One such parallel file system is the parallel virtual file system (PVFS), which is a freely available tool to achieve high-performance I/O on Linux-based clusters. Here, we describe the performance and scalability of the UNIX I/O interface to PVFS. To illustrate the performance, we present experimental results using Bonnie++, a commonly used file system benchmark to test file system throughput; a synthetic parallel I/O application for calculating aggregate read and write bandwidths; and a synthetic benchmark which calculates the time taken to untar the Linux kernel source tree to measure performance of a large number of small file operations. We obtained aggregate read and write bandwidths as high as 550 MB/s with a Myrinet-based network and 160MB/s with fast Ethernet.","PeriodicalId":105726,"journal":{"name":"12th Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2004. Proceedings.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131233356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Cooperative software multithreading to enhance utilization of embedded processors for network applications 协同软件多线程,提高嵌入式处理器在网络应用中的利用率
C. Albrecht, Rainer Hagenau, Andreas C. Döring
Multithreading is an efficient way to improve efficiency of processor cores in embedded products for networking infrastructures. To make such improvements also accessible to processor cores without hardware support for multithreading, we present a concept for efficient software multithreading through compiler post-pass optimization of the application code. Our approach aims at reducing the overhead for cooperative multithreading context switches at compile time by using standard compiler techniques such as context-insensitive analysis. Additionally, register usage is rearranged to reduce the amount of context-switch code by exploiting multiple-load/store instructions. Performance model analysis encourages the use of software multithreading to improve processor utilization by showing the benefit of our approach. We present results obtained by an implementation for the PowerPC ISA (Instruction Set Architecture) using the code of a real network application (iSCSI). We were able to reduce the expected run-time of a context switch to as little as 38% of the original.
多线程是提高网络基础设施嵌入式产品处理器内核效率的有效途径。为了使处理器内核在没有多线程硬件支持的情况下也能实现这些改进,我们提出了一个通过编译器对应用程序代码进行事后优化的高效软件多线程的概念。我们的方法旨在通过使用诸如上下文不敏感分析之类的标准编译器技术,在编译时减少协作多线程上下文切换的开销。此外,通过利用多个加载/存储指令,重新安排寄存器的使用以减少上下文切换代码的数量。性能模型分析鼓励使用软件多线程,通过展示我们的方法的好处来提高处理器利用率。我们给出了使用实际网络应用程序(iSCSI)的代码实现PowerPC ISA(指令集体系结构)所获得的结果。我们能够将上下文切换的预期运行时间减少到原来的38%。
{"title":"Cooperative software multithreading to enhance utilization of embedded processors for network applications","authors":"C. Albrecht, Rainer Hagenau, Andreas C. Döring","doi":"10.1109/EMPDP.2004.1271459","DOIUrl":"https://doi.org/10.1109/EMPDP.2004.1271459","url":null,"abstract":"Multithreading is an efficient way to improve efficiency of processor cores in embedded products for networking infrastructures. To make such improvements also accessible to processor cores without hardware support for multithreading, we present a concept for efficient software multithreading through compiler post-pass optimization of the application code. Our approach aims at reducing the overhead for cooperative multithreading context switches at compile time by using standard compiler techniques such as context-insensitive analysis. Additionally, register usage is rearranged to reduce the amount of context-switch code by exploiting multiple-load/store instructions. Performance model analysis encourages the use of software multithreading to improve processor utilization by showing the benefit of our approach. We present results obtained by an implementation for the PowerPC ISA (Instruction Set Architecture) using the code of a real network application (iSCSI). We were able to reduce the expected run-time of a context switch to as little as 38% of the original.","PeriodicalId":105726,"journal":{"name":"12th Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2004. Proceedings.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116531866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
The multikey Web cache simulator: a platform for designing proxy cache management techniques 多键Web缓存模拟器:一个设计代理缓存管理技术的平台
L. Cárdenas, J. Sahuquillo, A. Pont, J. A. Gil
Proxy caches have become an important mechanism to reduce latencies. Efficient management techniques for proxy caches which exploits Web-objects inherent characteristics are an essential key to reach good performance. One important segment of the replacement algorithms being applied today are the multikey algorithms that use several key or object characteristics to decide which object or objects must be replaced. This feature is not considered in most of the current simulators. In this paper we propose a proxy-cache platform to check the performance of Web object based on multikey management techniques and algorithms. The proposed platform is coded in a modular way, which allows the implementation of new algorithms or policies proposals in an easy and robust manner. In addition to the classical performance metrics like the hit ratio and the byte hit ratio, the proposed framework also offers the response time perceived by users.
代理缓存已经成为减少延迟的重要机制。利用web对象固有特征的代理缓存的有效管理技术是实现良好性能的关键。目前应用的替换算法的一个重要部分是多键算法,它使用几个键或对象特征来决定必须替换哪个对象或对象。目前大多数模拟器都没有考虑到这个特性。本文提出了一个基于多密钥管理技术和算法的代理缓存平台来检测Web对象的性能。提出的平台以模块化的方式编码,允许以简单和健壮的方式实现新的算法或策略建议。除了传统的性能指标,如命中率和字节命中率,该框架还提供了用户感知的响应时间。
{"title":"The multikey Web cache simulator: a platform for designing proxy cache management techniques","authors":"L. Cárdenas, J. Sahuquillo, A. Pont, J. A. Gil","doi":"10.1109/EMPDP.2004.1271471","DOIUrl":"https://doi.org/10.1109/EMPDP.2004.1271471","url":null,"abstract":"Proxy caches have become an important mechanism to reduce latencies. Efficient management techniques for proxy caches which exploits Web-objects inherent characteristics are an essential key to reach good performance. One important segment of the replacement algorithms being applied today are the multikey algorithms that use several key or object characteristics to decide which object or objects must be replaced. This feature is not considered in most of the current simulators. In this paper we propose a proxy-cache platform to check the performance of Web object based on multikey management techniques and algorithms. The proposed platform is coded in a modular way, which allows the implementation of new algorithms or policies proposals in an easy and robust manner. In addition to the classical performance metrics like the hit ratio and the byte hit ratio, the proposed framework also offers the response time perceived by users.","PeriodicalId":105726,"journal":{"name":"12th Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2004. Proceedings.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125107366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Performance evaluation on grids: directions, issues, and open problems 网格性能评估:方向、问题和开放性问题
Z. Németh, G. Gombás, Z. Balaton
Grids are semantically different from other distributed systems. Therefore, performance analysis, just like any other technique requires careful reconsideration. We analyse the fundamental differences between grids and other systems and point out the special requirements raised to performance analysis. The main aim is to survey the special problems, the possible directions and the existing solutions. A monitoring system, that is able to support the posed requirements is introduced as an example.
网格在语义上不同于其他分布式系统。因此,性能分析就像任何其他技术一样,需要仔细地重新考虑。分析了网格与其他系统的根本区别,并指出了对性能分析提出的特殊要求。其主要目的是调查其存在的特殊问题、可能的发展方向和现有的解决方案。作为一个例子,介绍了一个能够支持所提出的要求的监测系统。
{"title":"Performance evaluation on grids: directions, issues, and open problems","authors":"Z. Németh, G. Gombás, Z. Balaton","doi":"10.1109/EMPDP.2004.1271458","DOIUrl":"https://doi.org/10.1109/EMPDP.2004.1271458","url":null,"abstract":"Grids are semantically different from other distributed systems. Therefore, performance analysis, just like any other technique requires careful reconsideration. We analyse the fundamental differences between grids and other systems and point out the special requirements raised to performance analysis. The main aim is to survey the special problems, the possible directions and the existing solutions. A monitoring system, that is able to support the posed requirements is introduced as an example.","PeriodicalId":105726,"journal":{"name":"12th Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2004. Proceedings.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129446127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Workflow principles applied to multi-solution analysis of dependable distributed systems 工作流原理在可靠分布式系统多方案分析中的应用
Francesco Moscato, N. Mazzocca, V. Vittorini
Real world dependable distributed systems are often heterogeneous, not only in their physical composition, but also from a modeling and analysis perspective. Indeed, different components may be modeled by using the most suitable modeling formalism and multisolution strategies may be applied to analyze the resulting multi-formalism model since no single solution method is adequate to solve all submodels. We present the architecture of an extensible multiformalism framework for the modeling and design of distributed dependable system. We show that the process needed to solve/analyze a model expressed through different formalisms may be described as it were a business process and executed by means of a workflow engine. We apply the proposed technique to a fault tolerant remote SCADA (supervisory control and data acquisition) system.
现实世界中可靠的分布式系统通常是异构的,不仅在其物理组成方面,而且从建模和分析的角度来看也是如此。实际上,可以通过使用最合适的建模形式对不同的组件进行建模,并且可以应用多解决方案策略来分析所得到的多形式模型,因为没有一种解决方法足以解决所有子模型。提出了分布式可靠系统建模与设计的可扩展多形式化框架体系结构。我们展示了解决/分析通过不同形式表达的模型所需的过程,可以将其描述为业务过程,并通过工作流引擎执行。我们将该技术应用于一个容错远程SCADA(监控和数据采集)系统。
{"title":"Workflow principles applied to multi-solution analysis of dependable distributed systems","authors":"Francesco Moscato, N. Mazzocca, V. Vittorini","doi":"10.1109/EMPDP.2004.1271438","DOIUrl":"https://doi.org/10.1109/EMPDP.2004.1271438","url":null,"abstract":"Real world dependable distributed systems are often heterogeneous, not only in their physical composition, but also from a modeling and analysis perspective. Indeed, different components may be modeled by using the most suitable modeling formalism and multisolution strategies may be applied to analyze the resulting multi-formalism model since no single solution method is adequate to solve all submodels. We present the architecture of an extensible multiformalism framework for the modeling and design of distributed dependable system. We show that the process needed to solve/analyze a model expressed through different formalisms may be described as it were a business process and executed by means of a workflow engine. We apply the proposed technique to a fault tolerant remote SCADA (supervisory control and data acquisition) system.","PeriodicalId":105726,"journal":{"name":"12th Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2004. Proceedings.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115486397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Parallelization of time series forecasting model 时间序列预测模型的并行化
J. Górriz, C. Puntonet, M. Salmerón, R. Martín-Clemente
We show a parallel neural network (cross-over prediction model) for time series statistical learning implemented in PVM ("parallel virtual machine") and MPI ("message passing interface"), in order to reduce computational time. Parallelization is achieved in two ways: updating autoregressive parameters via a genetic algorithm and evaluating the overall prediction function via a parallel neural network. PVM permits an heterogeneous collection of Unix computers networked together to be viewed by our program as a simple parallel computer. We show different architectures of parallel processors systems and discuss its computing model.
我们展示了一个并行神经网络(交叉预测模型),用于在PVM(“并行虚拟机”)和MPI(“消息传递接口”)中实现的时间序列统计学习,以减少计算时间。并行化通过遗传算法更新自回归参数和并行神经网络评估整体预测函数两种方式实现。PVM允许网络在一起的Unix计算机的异构集合被我们的程序视为一个简单的并行计算机。给出了并行处理器系统的不同架构,并讨论了其计算模型。
{"title":"Parallelization of time series forecasting model","authors":"J. Górriz, C. Puntonet, M. Salmerón, R. Martín-Clemente","doi":"10.1109/EMPDP.2004.1271434","DOIUrl":"https://doi.org/10.1109/EMPDP.2004.1271434","url":null,"abstract":"We show a parallel neural network (cross-over prediction model) for time series statistical learning implemented in PVM (\"parallel virtual machine\") and MPI (\"message passing interface\"), in order to reduce computational time. Parallelization is achieved in two ways: updating autoregressive parameters via a genetic algorithm and evaluating the overall prediction function via a parallel neural network. PVM permits an heterogeneous collection of Unix computers networked together to be viewed by our program as a simple parallel computer. We show different architectures of parallel processors systems and discuss its computing model.","PeriodicalId":105726,"journal":{"name":"12th Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2004. Proceedings.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121324798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Optimization techniques for irregular and pointer-based programs 不规则和基于指针程序的优化技术
R. Asenjo, F. Corbera, E. Gutiérrez, M. Navarro, O. Plata, E. Zapata
Current compilers show inefficiencies when optimizing complex applications, both analyzing dependences and exploiting critical performance issues, like data locality and instruction/thread parallelism. Complex applications usually present irregular and/or dynamic (pointer-based) computational/data structures. By irregular we means applications that arrange data as multidimensional arrays and issue memory references through array indirections. Pointer-based applications, on the other hand, organize data as pointer-based structures (lists, trees, etc.) and issue memory references by means of pointers. We discuss optimization/parallelization and program analysis techniques we have developed to instruct a compiler to generate efficient object code from important classes of irregular and pointer-based applications. These techniques are embodied into a methodology that proceeds in three stages: program structure recognition, data analysis and program optimization/parallelization based on code/data transformations.
当前的编译器在优化复杂应用程序时效率低下,无论是分析依赖关系还是利用关键的性能问题,比如数据局部性和指令/线程并行性。复杂的应用程序通常呈现不规则和/或动态(基于指针的)计算/数据结构。我们所说的不规则是指将数据排列成多维数组并通过数组间接发出内存引用的应用程序。另一方面,基于指针的应用程序将数据组织为基于指针的结构(列表、树等),并通过指针发出内存引用。我们讨论了我们开发的优化/并行化和程序分析技术,以指导编译器从不规则和基于指针的应用程序的重要类生成有效的目标代码。这些技术体现在一个分三个阶段进行的方法论中:程序结构识别、数据分析和基于代码/数据转换的程序优化/并行化。
{"title":"Optimization techniques for irregular and pointer-based programs","authors":"R. Asenjo, F. Corbera, E. Gutiérrez, M. Navarro, O. Plata, E. Zapata","doi":"10.1109/EMPDP.2004.1271420","DOIUrl":"https://doi.org/10.1109/EMPDP.2004.1271420","url":null,"abstract":"Current compilers show inefficiencies when optimizing complex applications, both analyzing dependences and exploiting critical performance issues, like data locality and instruction/thread parallelism. Complex applications usually present irregular and/or dynamic (pointer-based) computational/data structures. By irregular we means applications that arrange data as multidimensional arrays and issue memory references through array indirections. Pointer-based applications, on the other hand, organize data as pointer-based structures (lists, trees, etc.) and issue memory references by means of pointers. We discuss optimization/parallelization and program analysis techniques we have developed to instruct a compiler to generate efficient object code from important classes of irregular and pointer-based applications. These techniques are embodied into a methodology that proceeds in three stages: program structure recognition, data analysis and program optimization/parallelization based on code/data transformations.","PeriodicalId":105726,"journal":{"name":"12th Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2004. Proceedings.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122988693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An approach to massively distributed aggregate computing on peer-to-peer networks 点对点网络上大规模分布式聚合计算的一种方法
Márk Jelasity, W. Kowalczyk, M. Steen
The emergence of the Internet as a computing platform increases the demand for new classes of algorithms that combine massive distributed processing and complete decentralization. Moreover, these algorithms should be able to execute in an environment that is heterogeneous, changes almost continuously, and consists of millions of nodes. An important class of algorithms that can play an important role in such environments is aggregate computing: computing the aggregation of attributes such as extremal values, mean, and variance. These algorithms typically find their application in distributed data mining and systems management. We present novel, massively scalable and fully decentralized algorithms for computing aggregates, and substantiate our scalability claims through simulations and theoretical analysis.
作为计算平台的互联网的出现增加了对结合大规模分布式处理和完全去中心化的新型算法的需求。此外,这些算法应该能够在异构的、几乎连续变化的、由数百万个节点组成的环境中执行。可以在这种环境中发挥重要作用的一类重要算法是聚合计算:计算诸如极值、平均值和方差等属性的聚合。这些算法通常在分布式数据挖掘和系统管理中得到应用。我们提出了新颖的、大规模可扩展的、完全分散的计算聚合算法,并通过模拟和理论分析证实了我们的可扩展性主张。
{"title":"An approach to massively distributed aggregate computing on peer-to-peer networks","authors":"Márk Jelasity, W. Kowalczyk, M. Steen","doi":"10.1109/EMPDP.2004.1271446","DOIUrl":"https://doi.org/10.1109/EMPDP.2004.1271446","url":null,"abstract":"The emergence of the Internet as a computing platform increases the demand for new classes of algorithms that combine massive distributed processing and complete decentralization. Moreover, these algorithms should be able to execute in an environment that is heterogeneous, changes almost continuously, and consists of millions of nodes. An important class of algorithms that can play an important role in such environments is aggregate computing: computing the aggregation of attributes such as extremal values, mean, and variance. These algorithms typically find their application in distributed data mining and systems management. We present novel, massively scalable and fully decentralized algorithms for computing aggregates, and substantiate our scalability claims through simulations and theoretical analysis.","PeriodicalId":105726,"journal":{"name":"12th Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2004. Proceedings.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132096151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
The effect of the degree of multistage interconnection networks on their performance: the case of delta and over-sized delta networks 多级互连网络的程度对其性能的影响:三角洲和超大规模三角洲网络的情况
A. C. Aljundi, J. Dekeyser
Interconnection network performance is a key factor when constructing parallel computers. Today's technological progress makes it possible to build and use crossbars of sizes up to 128. Crossbars can be used as switching elements (SEs) in parallel architectures intercommunication systems such as multistage interconnection networks (MINs). A MIN is usually defined, among others, by its topology. One of the factors defining the topology of a MIN is its degree. The degree of a MIN is the size of the SE of which it is composed. We are interested in studying the influence of the degree of two classes of MINs on their performance. The tested MINs classes are the famous delta networks and a subclass of this family called the over-sized delta networks. This study is to be used in future work in order to evaluate the use of MINs as an intercommunication medium in symmetric multiprocessors.
互连网络性能是构建并行计算机的关键因素。今天的技术进步使得建造和使用128尺寸的横梁成为可能。在多级互连网络(MINs)等并行互连系统中,横杆可以用作交换元件(se)。除其他外,最小值通常由其拓扑定义。定义最小值拓扑结构的因素之一是它的度。最小值的程度是它所组成的最小值的大小。我们感兴趣的是研究两类min的度数对其性能的影响。被测试的MINs类是著名的增量网络和该家族的一个子类,称为超大增量网络。这项研究将用于未来的工作,以评估在对称多处理器中使用MINs作为通信介质的情况。
{"title":"The effect of the degree of multistage interconnection networks on their performance: the case of delta and over-sized delta networks","authors":"A. C. Aljundi, J. Dekeyser","doi":"10.1109/EMPDP.2004.1271430","DOIUrl":"https://doi.org/10.1109/EMPDP.2004.1271430","url":null,"abstract":"Interconnection network performance is a key factor when constructing parallel computers. Today's technological progress makes it possible to build and use crossbars of sizes up to 128. Crossbars can be used as switching elements (SEs) in parallel architectures intercommunication systems such as multistage interconnection networks (MINs). A MIN is usually defined, among others, by its topology. One of the factors defining the topology of a MIN is its degree. The degree of a MIN is the size of the SE of which it is composed. We are interested in studying the influence of the degree of two classes of MINs on their performance. The tested MINs classes are the famous delta networks and a subclass of this family called the over-sized delta networks. This study is to be used in future work in order to evaluate the use of MINs as an intercommunication medium in symmetric multiprocessors.","PeriodicalId":105726,"journal":{"name":"12th Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2004. Proceedings.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125858537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Improving cache locality with blocked array layouts 通过阻塞数组布局改进缓存局部性
Evangelia Athanasaki, N. Koziris
Minimizing cache misses is one of the most important factors to reduce average latency for memory accesses. Tiled codes modify the instruction stream to exploit cache locality for array accesses. Here, we further reduce cache misses, restructuring the memory layout of multidimensional arrays, that are accessed by tiled instruction code. In our method, array elements are stored in a blocked way, exactly as they are swept by the tiled instruction stream. We present a straightforward way to easily translate multidimensional indexing of arrays into their blocked memory layout using simple binary-mask operations. Indices for such array layouts are easily calculated based on the algebra of dilated integers, similarly to morton-order indexing. Actual experimental results, using matrix multiplication and LU-decomposition on various size arrays, illustrate that execution time is greatly improved when combining tiled code with tiled array layouts and binary mask-based index translation functions. Simulations using the Simplescalar tool, verify that enhanced performance is due to the considerable reduction of total cache misses.
最小化缓存丢失是减少内存访问平均延迟的最重要因素之一。平铺代码修改指令流以利用缓存局部性进行数组访问。在这里,我们进一步减少缓存缺失,重构由平铺指令代码访问的多维数组的内存布局。在我们的方法中,数组元素以阻塞的方式存储,就像它们被平铺指令流扫描一样。我们提供了一种简单的方法,可以使用简单的二进制掩码操作,轻松地将数组的多维索引转换为它们的阻塞内存布局。这种数组布局的索引很容易基于扩展整数的代数计算,类似于morton-order索引。在不同大小的数组上使用矩阵乘法和lu分解的实际实验结果表明,将平铺代码与平铺数组布局和基于二进制掩码的索引转换函数相结合可以大大提高执行时间。使用Simplescalar工具进行模拟,验证性能的增强是由于大量减少了总缓存丢失。
{"title":"Improving cache locality with blocked array layouts","authors":"Evangelia Athanasaki, N. Koziris","doi":"10.1109/EMPDP.2004.1271460","DOIUrl":"https://doi.org/10.1109/EMPDP.2004.1271460","url":null,"abstract":"Minimizing cache misses is one of the most important factors to reduce average latency for memory accesses. Tiled codes modify the instruction stream to exploit cache locality for array accesses. Here, we further reduce cache misses, restructuring the memory layout of multidimensional arrays, that are accessed by tiled instruction code. In our method, array elements are stored in a blocked way, exactly as they are swept by the tiled instruction stream. We present a straightforward way to easily translate multidimensional indexing of arrays into their blocked memory layout using simple binary-mask operations. Indices for such array layouts are easily calculated based on the algebra of dilated integers, similarly to morton-order indexing. Actual experimental results, using matrix multiplication and LU-decomposition on various size arrays, illustrate that execution time is greatly improved when combining tiled code with tiled array layouts and binary mask-based index translation functions. Simulations using the Simplescalar tool, verify that enhanced performance is due to the considerable reduction of total cache misses.","PeriodicalId":105726,"journal":{"name":"12th Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2004. Proceedings.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115593386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
12th Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2004. Proceedings.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1