首页 > 最新文献

Proceedings 1998 International Conference on Parallel and Distributed Systems (Cat. No.98TB100250)最新文献

英文 中文
Toward supporting data parallel programming on clusters of symmetric multiprocessors 在对称多处理器集群上支持数据并行编程
Chia-Lien Chiang, Jan-Jan Wu, Nai-Wei Lin
The paper reports the design of a runtime library for data-parallel programming on clusters of symmetric multiprocessors (SMP clusters). Our design algorithms exploit a hybrid methodology which maps directly to the underlying hierarchical memory system in SMP clusters, by combining two styles of programming methodologies-threads (shared memory programming) within a SMP node and message passing between SMP nodes. This hybrid approach has been used in the implementation of a library for collective communications. The prototype library is implemented based on standard interfaces for threads (pthread) and message passing (MPI). Experimental results on a cluster of Sun UltraSparc-II workstations are reported.
本文报道了一个用于对称多处理器集群(SMP集群)数据并行编程的运行库的设计。我们的设计算法利用一种混合方法,通过结合两种风格的编程方法——SMP节点内的线程(共享内存编程)和SMP节点之间的消息传递,直接映射到SMP集群中的底层分层内存系统。这种混合方法已用于实现用于集体通信的库。原型库是基于线程(pthread)和消息传递(MPI)的标准接口实现的。本文报道了在Sun UltraSparc-II工作站集群上的实验结果。
{"title":"Toward supporting data parallel programming on clusters of symmetric multiprocessors","authors":"Chia-Lien Chiang, Jan-Jan Wu, Nai-Wei Lin","doi":"10.1109/ICPADS.1998.741143","DOIUrl":"https://doi.org/10.1109/ICPADS.1998.741143","url":null,"abstract":"The paper reports the design of a runtime library for data-parallel programming on clusters of symmetric multiprocessors (SMP clusters). Our design algorithms exploit a hybrid methodology which maps directly to the underlying hierarchical memory system in SMP clusters, by combining two styles of programming methodologies-threads (shared memory programming) within a SMP node and message passing between SMP nodes. This hybrid approach has been used in the implementation of a library for collective communications. The prototype library is implemented based on standard interfaces for threads (pthread) and message passing (MPI). Experimental results on a cluster of Sun UltraSparc-II workstations are reported.","PeriodicalId":226947,"journal":{"name":"Proceedings 1998 International Conference on Parallel and Distributed Systems (Cat. No.98TB100250)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115511439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
On reconfiguring query execution plans in distributed object-relational DBMS 分布式对象-关系DBMS中查询执行计划的重构研究
K. Ng, Zhenghao Wang, R. Muntz, E. C. Shek
Massive database sizes and growing demands for decision support and data mining result in long-running queries in extensible object-relational DBMSs, particularly in decision support and data warehousing analysis applications. Parallelization of query evaluation is often required for acceptable performance, yet queries are frequently processed suboptimally due to (1) only coarse or inaccurate estimates of the query characteristics and database statistics being available prior to query evaluation; (2) changes in system configuration and resource availability during query evaluation. In a distributed environment, dynamically reconfiguring query execution plans (QEPs), which adapts QEPs to the environment as well as to the query characteristics, is a promising means to significantly improve query evaluation performance. Based on an operator classification, we propose an algorithm to coordinate the steps in a reconfiguration and introduce alternatives for execution context checkpointing and restoring. A syntactic extension of SQL to expose the relevant characteristics of user-defined functions in support of dynamic reconfiguration is proposed. An example from the experimental system is presented.
庞大的数据库规模以及对决策支持和数据挖掘日益增长的需求导致在可扩展的对象关系dbms中出现长时间运行的查询,特别是在决策支持和数据仓库分析应用程序中。为了获得可接受的性能,通常需要并行化查询评估,但是查询的处理经常不是最优的,这是因为:(1)在查询评估之前,只有对查询特征和数据库统计信息的粗略或不准确的估计;(2)查询评估过程中系统配置和资源可用性的变化。在分布式环境中,动态重新配置查询执行计划(qep),使qep适应环境和查询特征,是一种有希望显著提高查询评估性能的方法。基于算子分类,我们提出了一种算法来协调重新配置中的步骤,并引入了执行上下文检查点和恢复的替代方法。提出了一种SQL的语法扩展,以公开用户定义函数的相关特征,支持动态重新配置。给出了实验系统的一个实例。
{"title":"On reconfiguring query execution plans in distributed object-relational DBMS","authors":"K. Ng, Zhenghao Wang, R. Muntz, E. C. Shek","doi":"10.1109/ICPADS.1998.741020","DOIUrl":"https://doi.org/10.1109/ICPADS.1998.741020","url":null,"abstract":"Massive database sizes and growing demands for decision support and data mining result in long-running queries in extensible object-relational DBMSs, particularly in decision support and data warehousing analysis applications. Parallelization of query evaluation is often required for acceptable performance, yet queries are frequently processed suboptimally due to (1) only coarse or inaccurate estimates of the query characteristics and database statistics being available prior to query evaluation; (2) changes in system configuration and resource availability during query evaluation. In a distributed environment, dynamically reconfiguring query execution plans (QEPs), which adapts QEPs to the environment as well as to the query characteristics, is a promising means to significantly improve query evaluation performance. Based on an operator classification, we propose an algorithm to coordinate the steps in a reconfiguration and introduce alternatives for execution context checkpointing and restoring. A syntactic extension of SQL to expose the relevant characteristics of user-defined functions in support of dynamic reconfiguration is proposed. An example from the experimental system is presented.","PeriodicalId":226947,"journal":{"name":"Proceedings 1998 International Conference on Parallel and Distributed Systems (Cat. No.98TB100250)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114674796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A programmable digital neuro-processor design with dynamically reconfigurable pipeline/parallel architecture 具有动态可重构流水线/并行结构的可编程数字神经处理器设计
Young-Jin Jang, Chan-Ho Park, Hyon-Soo Lee
Previous neural network processors were configured either into a SIMD or into an instruction systolic array (ISA) ring architecture using the canonical mapping methodology. The disadvantages of these processors are the lack of generality, scalability, programmability and reconfigurability. So, we propose a programmable neuroprocessor whose architecture is dynamically reconfigurable into either SIMD or an ISA ring according to the data dependencies of any neural network model. To improve the computing time, the computation of an activation function, which typically needed tens of cycles in previous processors, can be done in a single cycle by using piecewise linear (PWL) function approximation. Using a simple bus architecture and instruction set, the proposed processor allows the implementation of neural networks larger than the physical processor element array and allows the user to solve any neural network model. We verify these properties with the error backpropagation (EBP) model and estimate the computation time of the proposed processor.
以前的神经网络处理器要么配置为SIMD,要么使用规范映射方法配置为指令收缩数组(ISA)环体系结构。这些处理器的缺点是缺乏通用性、可扩展性、可编程性和可重构性。因此,我们提出了一种可编程神经处理器,其架构可以根据任何神经网络模型的数据依赖关系动态地重构为SIMD或ISA环。激活函数的计算在以前的处理器中通常需要几十个周期,为了缩短计算时间,可以使用分段线性(PWL)函数近似在一个周期内完成。使用简单的总线架构和指令集,所提出的处理器允许实现比物理处理器元素阵列更大的神经网络,并允许用户求解任何神经网络模型。我们用误差反向传播(EBP)模型验证了这些特性,并估计了所提出处理器的计算时间。
{"title":"A programmable digital neuro-processor design with dynamically reconfigurable pipeline/parallel architecture","authors":"Young-Jin Jang, Chan-Ho Park, Hyon-Soo Lee","doi":"10.1109/ICPADS.1998.741014","DOIUrl":"https://doi.org/10.1109/ICPADS.1998.741014","url":null,"abstract":"Previous neural network processors were configured either into a SIMD or into an instruction systolic array (ISA) ring architecture using the canonical mapping methodology. The disadvantages of these processors are the lack of generality, scalability, programmability and reconfigurability. So, we propose a programmable neuroprocessor whose architecture is dynamically reconfigurable into either SIMD or an ISA ring according to the data dependencies of any neural network model. To improve the computing time, the computation of an activation function, which typically needed tens of cycles in previous processors, can be done in a single cycle by using piecewise linear (PWL) function approximation. Using a simple bus architecture and instruction set, the proposed processor allows the implementation of neural networks larger than the physical processor element array and allows the user to solve any neural network model. We verify these properties with the error backpropagation (EBP) model and estimate the computation time of the proposed processor.","PeriodicalId":226947,"journal":{"name":"Proceedings 1998 International Conference on Parallel and Distributed Systems (Cat. No.98TB100250)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116687561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Detecting the first races in parallel programs with ordered synchronization 用有序同步检测并行程序中的第一个竞争
Hee-Dong Park, Yong-Kee Jun
Detecting races is important for debugging shared memory parallel programs, because the races result in unintended nondeterministic executions of the programs. Previous on-the-fly techniques to detect races in programs with inter thread coordination such as ordered synchronization cannot guarantee that the race detected first is not preceded by events that also participate in a race. The paper presents a novel two pass on-the-fly algorithm to detect the first races in such parallel programs. Detecting the first races is important in debugging, because the removal of such races may make other races disappear including those detected first by the previous techniques. Therefore, this technique makes on-the-fly race detection more effective and practical in debugging parallel programs.
检测争用对于调试共享内存并行程序非常重要,因为争用会导致程序意外的非确定性执行。以前在线程间协调的程序中检测竞争的实时技术(如有序同步)不能保证首先检测到的竞争之前没有参与竞争的事件。本文提出了一种新型的双通道实时算法来检测这类并行程序中的先赛跑。检测第一个种族在调试中很重要,因为删除这些种族可能会使其他种族消失,包括那些由以前的技术首先检测到的种族。因此,该技术使动态竞赛检测在并行程序调试中更加有效和实用。
{"title":"Detecting the first races in parallel programs with ordered synchronization","authors":"Hee-Dong Park, Yong-Kee Jun","doi":"10.1109/ICPADS.1998.741043","DOIUrl":"https://doi.org/10.1109/ICPADS.1998.741043","url":null,"abstract":"Detecting races is important for debugging shared memory parallel programs, because the races result in unintended nondeterministic executions of the programs. Previous on-the-fly techniques to detect races in programs with inter thread coordination such as ordered synchronization cannot guarantee that the race detected first is not preceded by events that also participate in a race. The paper presents a novel two pass on-the-fly algorithm to detect the first races in such parallel programs. Detecting the first races is important in debugging, because the removal of such races may make other races disappear including those detected first by the previous techniques. Therefore, this technique makes on-the-fly race detection more effective and practical in debugging parallel programs.","PeriodicalId":226947,"journal":{"name":"Proceedings 1998 International Conference on Parallel and Distributed Systems (Cat. No.98TB100250)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117225333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
A cost and performance comparison for wormhole routers based on HDL designs 基于HDL设计的虫洞路由器的成本和性能比较
T. Yoshinaga, Masaya Hayashi, Maki Horita, Y. Yamaguchi, K. Ootsu, T. Baba
Our research investigates cost and performance characteristics for wormhole routers based on HDL designs. Comparison for dimension order routers and turn model based adaptive routers leads to the following conclusions: (1) static and additional routing information which we propose, such as prior dimension specification and in-order delivery, improves the communication performance; (2) an adaptive routing algorithm must be implemented to satisfy the objective speed of the design (the operation speed of the routers significantly affects the network performance); (3) the virtual channels cancel the improvement not only for the dimension order router but also for the naive implementation of the adaptive routers when they degrade the operation speed.
本文研究了基于HDL设计的虫洞路由器的成本和性能特征。通过对维序路由器和基于转弯模型的自适应路由器的比较,得出以下结论:(1)我们提出的静态和附加路由信息,如先验维序规范和有序传递,提高了通信性能;(2)自适应路由算法必须满足设计的目标速度(路由器的运行速度对网络性能影响较大);(3)虚拟通道降低了自适应路由器的运行速度,不仅抵消了维序路由器的改进,而且也抵消了自适应路由器的幼稚实现。
{"title":"A cost and performance comparison for wormhole routers based on HDL designs","authors":"T. Yoshinaga, Masaya Hayashi, Maki Horita, Y. Yamaguchi, K. Ootsu, T. Baba","doi":"10.1109/ICPADS.1998.741100","DOIUrl":"https://doi.org/10.1109/ICPADS.1998.741100","url":null,"abstract":"Our research investigates cost and performance characteristics for wormhole routers based on HDL designs. Comparison for dimension order routers and turn model based adaptive routers leads to the following conclusions: (1) static and additional routing information which we propose, such as prior dimension specification and in-order delivery, improves the communication performance; (2) an adaptive routing algorithm must be implemented to satisfy the objective speed of the design (the operation speed of the routers significantly affects the network performance); (3) the virtual channels cancel the improvement not only for the dimension order router but also for the naive implementation of the adaptive routers when they degrade the operation speed.","PeriodicalId":226947,"journal":{"name":"Proceedings 1998 International Conference on Parallel and Distributed Systems (Cat. No.98TB100250)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120999804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Fault tolerant all-to-all broadcast in general interconnection networks
Yuzhong Sun, P. Cheung, X. Lin, Keqin Li
With respect to scalability and arbitrary topologies of the underlying networks in multiprogramming and multithread environments, fault tolerance in acknowledged ATAB and concurrent communications become a challenge to reliable general wormhole routing multicomputers with arbitrary topologies. In this paper, the virtual ring tree (VRT) is proposed to deal with the challenge. A single startup is needed in the two proposed algorithms by a simple virtual node space, which also reduces the complexity of routing at intermediate steps of ATAB algorithms and re-beginning an ATAB, by cacheable virtual channels. The proposed algorithm can automatically handle static faults in networks.
在多道编程和多线程环境下,由于底层网络的可扩展性和任意拓扑,在承认ATAB和并发通信中的容错性对具有任意拓扑的通用虫洞路由多机的可靠性提出了挑战。本文提出了虚拟环树(VRT)来解决这一问题。这两种算法都是通过一个简单的虚拟节点空间来实现单次启动,并且通过可缓存的虚拟通道降低了ATAB算法中间步骤路由和重新启动ATAB的复杂性。该算法能够自动处理网络中的静态故障。
{"title":"Fault tolerant all-to-all broadcast in general interconnection networks","authors":"Yuzhong Sun, P. Cheung, X. Lin, Keqin Li","doi":"10.1109/ICPADS.1998.741050","DOIUrl":"https://doi.org/10.1109/ICPADS.1998.741050","url":null,"abstract":"With respect to scalability and arbitrary topologies of the underlying networks in multiprogramming and multithread environments, fault tolerance in acknowledged ATAB and concurrent communications become a challenge to reliable general wormhole routing multicomputers with arbitrary topologies. In this paper, the virtual ring tree (VRT) is proposed to deal with the challenge. A single startup is needed in the two proposed algorithms by a simple virtual node space, which also reduces the complexity of routing at intermediate steps of ATAB algorithms and re-beginning an ATAB, by cacheable virtual channels. The proposed algorithm can automatically handle static faults in networks.","PeriodicalId":226947,"journal":{"name":"Proceedings 1998 International Conference on Parallel and Distributed Systems (Cat. No.98TB100250)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125812688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Two problems on butterfly graphs 关于蝴蝶图的两个问题
Shien-Ching Hwang, Gen-Huey Chen
The cycle partition problem and the pancycle problem on butterfly graphs are studied in this paper. Suppose G=(V,E) is a graph and {V/sub 1/,V/sub 2/,...,V/sub s/} is a partition of V. We say that {V/sub 1/,V/sub 2/,...,V/sub s/} forms a cycle partition of G if each subgraph of G induced by V/sub 1/ contains a cycle of length |V/sub i/|, where 1/spl les/i/spl les/s. A cycle partition {V/sub 1/,V/sub 2/,...,V/sub s/} is /spl lambda/-uniform if |V/sub 1/|=|V/sub 2/|=...=|V/sub s/|=/spl lambda/. G has /spl lambda/-complete uniform cycle partitions if G has m/spl lambda/-uniform cycle partitions for all 1/spl les/m/spl les/(r+n)/2 and m dividing |V|//spl lambda/. Let BF(k,r) denote the r-dimensional k-ary butterfly graph. For the cycle partition problem, we construct a lot of uniform cycle partitions for BF(k,r). Besides, we construct r-complete uniform cycle partitions for BF(2,r), and kr-complete uniform cycle partitions for BF(k,r). For the pancycle problem, given any pair of n and r we can determine if there exists a cycle of length n in BF(2,r), and construct it if it exists. The results of this paper reveal that the butterfly graphs are superior in embedding rings. They can embed rings of almost all possible lengths. Besides, there are many situations in which they can embed the most rings of the same length.
本文研究了蝴蝶图上的环划分问题和环问题。假设G=(V,E)是一个图,并且{V/下标1/,V/下标2/,…,V/下标s/}是V的分划,我们说{V/下标1/,V/下标2/,…,如果由V/sub 1/引起的G的每个子图包含一个长度为|V/sub i/|的循环,其中1/spl小于/i/spl小于/s,则V/sub s/}形成G的循环划分。循环分区{V/sub 1/,V/sub 2/,…, V /子s /} / splλ/制服如果| V /订阅1 / | = | V /子2 / | =…=|V/sub /|=/spl lambda/。G有/spl lambda/-完全均匀循环分区如果G有m/spl lambda/-均匀循环分区对于所有1/spl les/m/spl les/(r+n)/2和m除以|V|//spl lambda/。设BF(k,r)表示r维k元蝴蝶图。对于循环划分问题,我们构造了BF(k,r)的许多一致循环划分。此外,我们构造了BF(2,r)的r-完全一致循环分区和BF(k,r)的r-完全一致循环分区。对于环问题,给定任意一对n和r,我们可以确定BF(2,r)中是否存在长度为n的环,如果存在则构造它。结果表明,蝴蝶图在嵌入环方面具有优越性。它们可以嵌入几乎所有可能长度的环。此外,在许多情况下,它们可以嵌入相同长度的最多的环。
{"title":"Two problems on butterfly graphs","authors":"Shien-Ching Hwang, Gen-Huey Chen","doi":"10.1109/ICPADS.1998.741134","DOIUrl":"https://doi.org/10.1109/ICPADS.1998.741134","url":null,"abstract":"The cycle partition problem and the pancycle problem on butterfly graphs are studied in this paper. Suppose G=(V,E) is a graph and {V/sub 1/,V/sub 2/,...,V/sub s/} is a partition of V. We say that {V/sub 1/,V/sub 2/,...,V/sub s/} forms a cycle partition of G if each subgraph of G induced by V/sub 1/ contains a cycle of length |V/sub i/|, where 1/spl les/i/spl les/s. A cycle partition {V/sub 1/,V/sub 2/,...,V/sub s/} is /spl lambda/-uniform if |V/sub 1/|=|V/sub 2/|=...=|V/sub s/|=/spl lambda/. G has /spl lambda/-complete uniform cycle partitions if G has m/spl lambda/-uniform cycle partitions for all 1/spl les/m/spl les/(r+n)/2 and m dividing |V|//spl lambda/. Let BF(k,r) denote the r-dimensional k-ary butterfly graph. For the cycle partition problem, we construct a lot of uniform cycle partitions for BF(k,r). Besides, we construct r-complete uniform cycle partitions for BF(2,r), and kr-complete uniform cycle partitions for BF(k,r). For the pancycle problem, given any pair of n and r we can determine if there exists a cycle of length n in BF(2,r), and construct it if it exists. The results of this paper reveal that the butterfly graphs are superior in embedding rings. They can embed rings of almost all possible lengths. Besides, there are many situations in which they can embed the most rings of the same length.","PeriodicalId":226947,"journal":{"name":"Proceedings 1998 International Conference on Parallel and Distributed Systems (Cat. No.98TB100250)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116430919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Performance evaluation of cache depot on CC-NUMA multiprocessors CC-NUMA多处理器上缓存库的性能评价
Hung-Chang Hsiao, C. King
Cache depot is a performance enhancement technique on cache-coherent non-uniform memory access (CC-NUMA) multiprocessors, in which nodes in the system store extra memory blocks on behalf of other nodes. In this way memory requests from a node can be satisfied by nearby depot nodes without going all the way to the home node. This not only reduces memory access latency and network traffic, but also spreads the network load more evenly. We study the design strategy for cache depot that: enhances the network interface of each node to include a depot cache, which stores those extra memory blocks for other nodes; and employs a new multicast routing scheme, which is called the multi-hop worms and works cooperatively with depot caches, to transmit coherence messages. By considering message routing and depot caches together the design concept can be applied even to those CC-NUMA systems that have a non-hierarchical, scalable interconnection network. We have developed an execution-driven simulator to evaluate the effectiveness of the design strategy. Performance results from using four SPLASH-2 benchmarks show that the design strategy improves the performance of the CC-NUMA multiprocessor by 11% to 21%. We have also studied in depth various factors which affect the performance of cache depot.
缓存库是一种基于缓存一致非均匀内存访问(CC-NUMA)多处理器的性能增强技术,在该技术中,系统中的节点代表其他节点存储额外的内存块。通过这种方式,来自一个节点的内存请求可以由附近的存储节点来满足,而不必一直到主节点。这不仅可以减少内存访问延迟和网络流量,还可以更均匀地分散网络负载。本文研究了高速缓存库的设计策略:增强每个节点的网络接口,使其包含一个高速缓存库,该缓存库为其他节点存储多余的内存块;采用了一种新的多跳蠕虫组播路由方案,该方案与仓库缓存协同工作,实现了相干报文的传输。通过将消息路由和仓库缓存一起考虑,该设计概念甚至可以应用于具有非分层、可扩展互连网络的CC-NUMA系统。我们开发了一个执行驱动的模拟器来评估设计策略的有效性。使用四个SPLASH-2基准测试的性能结果表明,该设计策略将CC-NUMA多处理器的性能提高了11%至21%。并对影响高速缓存库性能的各种因素进行了深入的研究。
{"title":"Performance evaluation of cache depot on CC-NUMA multiprocessors","authors":"Hung-Chang Hsiao, C. King","doi":"10.1109/ICPADS.1998.741127","DOIUrl":"https://doi.org/10.1109/ICPADS.1998.741127","url":null,"abstract":"Cache depot is a performance enhancement technique on cache-coherent non-uniform memory access (CC-NUMA) multiprocessors, in which nodes in the system store extra memory blocks on behalf of other nodes. In this way memory requests from a node can be satisfied by nearby depot nodes without going all the way to the home node. This not only reduces memory access latency and network traffic, but also spreads the network load more evenly. We study the design strategy for cache depot that: enhances the network interface of each node to include a depot cache, which stores those extra memory blocks for other nodes; and employs a new multicast routing scheme, which is called the multi-hop worms and works cooperatively with depot caches, to transmit coherence messages. By considering message routing and depot caches together the design concept can be applied even to those CC-NUMA systems that have a non-hierarchical, scalable interconnection network. We have developed an execution-driven simulator to evaluate the effectiveness of the design strategy. Performance results from using four SPLASH-2 benchmarks show that the design strategy improves the performance of the CC-NUMA multiprocessor by 11% to 21%. We have also studied in depth various factors which affect the performance of cache depot.","PeriodicalId":226947,"journal":{"name":"Proceedings 1998 International Conference on Parallel and Distributed Systems (Cat. No.98TB100250)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122349780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Object replication using version vector 使用版本向量的对象复制
K. Hasegawa, H. Higaki, M. Takizawa
In object-based systems, objects supporting abstract methods are replicated to increase the performance, reliability and availability. We discuss a novel object-based locking (OBL) protocol to lock replicas of objects by extending the quorum-based protocol for read and write to abstract methods. Unless two methods conflict, subsets of the replicas locked by the methods do not intersect even if the methods change the replicas. Methods not computed on a replica A but computed on another replica are computed on A when a method conflicting with the methods are issued to A in the OBL protocol. We newly propose a version vector to identify what methods are computed on a replica.
在基于对象的系统中,通过复制支持抽象方法的对象来提高性能、可靠性和可用性。我们讨论了一种新的基于对象的锁定(OBL)协议,通过将基于群体的读写协议扩展到抽象方法来锁定对象的副本。除非两个方法发生冲突,否则即使方法更改了副本,由方法锁定的副本的子集也不会相交。当在OBL协议中向a发出与方法冲突的方法时,将在a上计算不在副本a上计算但在另一个副本上计算的方法。我们最近提出了一个版本向量来识别在副本上计算的方法。
{"title":"Object replication using version vector","authors":"K. Hasegawa, H. Higaki, M. Takizawa","doi":"10.1109/ICPADS.1998.741033","DOIUrl":"https://doi.org/10.1109/ICPADS.1998.741033","url":null,"abstract":"In object-based systems, objects supporting abstract methods are replicated to increase the performance, reliability and availability. We discuss a novel object-based locking (OBL) protocol to lock replicas of objects by extending the quorum-based protocol for read and write to abstract methods. Unless two methods conflict, subsets of the replicas locked by the methods do not intersect even if the methods change the replicas. Methods not computed on a replica A but computed on another replica are computed on A when a method conflicting with the methods are issued to A in the OBL protocol. We newly propose a version vector to identify what methods are computed on a replica.","PeriodicalId":226947,"journal":{"name":"Proceedings 1998 International Conference on Parallel and Distributed Systems (Cat. No.98TB100250)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114250183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Incrementally extensible folded hypercube graphs 增量可扩展折叠超立方图
Hung-Yi Chang, Rong-Jaye Chen
In this paper we propose the incrementally extensible folded hypercube (IEFH) graph as a new class of interconnection networks for an arbitrary number of nodes. We show that this system is optimal fault tolerant and almost regular (i.e., the difference between the maximum and the minimum degree of nodes is at most one.). The diameter of this topology is half that of the incomplete hypercube (IH), the supercube, or the IEH graph. We also devise a simple routing algorithm for the IEFH graph. Further we embed cycles and complete binary trees into this graph optimally.
本文提出了增量可扩展折叠超立方体(IEFH)图作为一类具有任意数目节点的互连网络。结果表明,该系统具有最优容错性和几乎正则性(即节点最大度与最小度之差不大于1)。该拓扑的直径是不完全超立方体(IH)、超立方体或IEH图的一半。我们还为IEFH图设计了一个简单的路由算法。进一步将循环和完全二叉树最优地嵌入到图中。
{"title":"Incrementally extensible folded hypercube graphs","authors":"Hung-Yi Chang, Rong-Jaye Chen","doi":"10.1109/ICPADS.1998.741133","DOIUrl":"https://doi.org/10.1109/ICPADS.1998.741133","url":null,"abstract":"In this paper we propose the incrementally extensible folded hypercube (IEFH) graph as a new class of interconnection networks for an arbitrary number of nodes. We show that this system is optimal fault tolerant and almost regular (i.e., the difference between the maximum and the minimum degree of nodes is at most one.). The diameter of this topology is half that of the incomplete hypercube (IH), the supercube, or the IEH graph. We also devise a simple routing algorithm for the IEFH graph. Further we embed cycles and complete binary trees into this graph optimally.","PeriodicalId":226947,"journal":{"name":"Proceedings 1998 International Conference on Parallel and Distributed Systems (Cat. No.98TB100250)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117029666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
Proceedings 1998 International Conference on Parallel and Distributed Systems (Cat. No.98TB100250)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1