Pub Date : 2001-02-01DOI: 10.1109/EMPDP.2001.904965
Elvira Baydal, P. López, J. Duato
Deadlock avoidance and recovery techniques suffer from severe performance degradation when the network is close to or beyond saturation. Many parallel applications produce bursty traffic that may saturate the network during some intervals, and increase execution time. Therefore, the use of techniques that prevent network saturation are of crucial importance in both deadlock avoidance and recovery strategies. Several mechanisms have been proposed in the literature to reach this goal. However some of them do not work well under all network load conditions. Others introduce some penalty when the network is not fully saturated, or complicate network and/or node implementation. In this paper we propose a new mechanism to avoid network saturation that overcomes these drawbacks. In this mechanism, each node estimates network traffic locally by using the percentage of free virtual output channels that can be used for forwarding a message towards its destination. When this number surpasses a threshold value, network congestion is assumed to exist and message injection is forbidden.
{"title":"A congestion control mechanism for wormhole networks","authors":"Elvira Baydal, P. López, J. Duato","doi":"10.1109/EMPDP.2001.904965","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.904965","url":null,"abstract":"Deadlock avoidance and recovery techniques suffer from severe performance degradation when the network is close to or beyond saturation. Many parallel applications produce bursty traffic that may saturate the network during some intervals, and increase execution time. Therefore, the use of techniques that prevent network saturation are of crucial importance in both deadlock avoidance and recovery strategies. Several mechanisms have been proposed in the literature to reach this goal. However some of them do not work well under all network load conditions. Others introduce some penalty when the network is not fully saturated, or complicate network and/or node implementation. In this paper we propose a new mechanism to avoid network saturation that overcomes these drawbacks. In this mechanism, each node estimates network traffic locally by using the percentage of free virtual output channels that can be used for forwarding a message towards its destination. When this number surpasses a threshold value, network congestion is assumed to exist and message injection is forbidden.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"4300 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133416977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1109/EMPDP.2001.905072
U. Ruckert
Three different hardware implementations of artificial neural networks are presented. The chips are model-specific integrated circuits for neural associative memories, self-organizing feature maps and local cluster neural networks. Some of the key implementational issues are considered and especially the question of resource-efficiency is discussed.
{"title":"ULSI architectures for artificial neural networks","authors":"U. Ruckert","doi":"10.1109/EMPDP.2001.905072","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905072","url":null,"abstract":"Three different hardware implementations of artificial neural networks are presented. The chips are model-specific integrated circuits for neural associative memories, self-organizing feature maps and local cluster neural networks. Some of the key implementational issues are considered and especially the question of resource-efficiency is discussed.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116403120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1109/EMPDP.2001.905061
M. Garzarán, J. L. Briz, P. Ibáñez, V. Viñals
Data prefetching has been widely studied as a technique to hide memory access latency in multiprocessors. Most recent research on hardware prefetching focuses either on uniprocessors, or on distributed shared memory (DSM) and other non bus-based organizations. However, in the context of bus-based SMPs, prefetching poses a number of problems related to the lack of scalability and limited bus bandwidth of these modest-sized machines. This paper considers how the number of processors and the memory access patterns in the program influence the relative performance of sequential and non-sequential prefetching mechanisms in a bus-based SMP. We compare the performance of four inexpensive hardware prefetching techniques, varying the number of processors. After a breakdown of the results based on a performance model, we propose a cost-effective hardware prefetching solution for implementing on such modest-sized multiprocessors.
{"title":"Hardware prefetching in bus-based multiprocessors: pattern characterization and cost-effective hardware","authors":"M. Garzarán, J. L. Briz, P. Ibáñez, V. Viñals","doi":"10.1109/EMPDP.2001.905061","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905061","url":null,"abstract":"Data prefetching has been widely studied as a technique to hide memory access latency in multiprocessors. Most recent research on hardware prefetching focuses either on uniprocessors, or on distributed shared memory (DSM) and other non bus-based organizations. However, in the context of bus-based SMPs, prefetching poses a number of problems related to the lack of scalability and limited bus bandwidth of these modest-sized machines. This paper considers how the number of processors and the memory access patterns in the program influence the relative performance of sequential and non-sequential prefetching mechanisms in a bus-based SMP. We compare the performance of four inexpensive hardware prefetching techniques, varying the number of processors. After a breakdown of the results based on a performance model, we propose a cost-effective hardware prefetching solution for implementing on such modest-sized multiprocessors.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117321946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}