Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040855
P. Oberoi, G. Sohi
Conventional instruction fetch mechanisms fetch contiguous blocks of instructions in each cycle. They are difficult to scale since taken branches make it hard to increase the size of these blocks beyond eight instructions. Trace caches have been proposed as a solution to this problem, but they use cache space inefficiently. We show that fetching large blocks of contiguous instructions, or wide fetch, is inefficient for modern out-of-order processors. Instead of the usual approach of fetching large blocks of instructions from a single point in the program, we propose a high-bandwidth fetch mechanism that fetches small blocks of instructions from multiple points in a program. In this paper, we demonstrate that it is possible to achieve high-bandwidth fetch by using multiple narrow fetch units operating in parallel. Our mechanism performs as well as a trace cache, does not waste cache space, is more resilient to instruction cache misses, and is a natural fit for techniques that require fetching multiple threads, like multithreading, dual-path execution, and speculative threads.
{"title":"Out-of-order instruction fetch using multiple sequencers","authors":"P. Oberoi, G. Sohi","doi":"10.1109/ICPP.2002.1040855","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040855","url":null,"abstract":"Conventional instruction fetch mechanisms fetch contiguous blocks of instructions in each cycle. They are difficult to scale since taken branches make it hard to increase the size of these blocks beyond eight instructions. Trace caches have been proposed as a solution to this problem, but they use cache space inefficiently. We show that fetching large blocks of contiguous instructions, or wide fetch, is inefficient for modern out-of-order processors. Instead of the usual approach of fetching large blocks of instructions from a single point in the program, we propose a high-bandwidth fetch mechanism that fetches small blocks of instructions from multiple points in a program. In this paper, we demonstrate that it is possible to achieve high-bandwidth fetch by using multiple narrow fetch units operating in parallel. Our mechanism performs as well as a trace cache, does not waste cache space, is more resilient to instruction cache misses, and is a natural fit for techniques that require fetching multiple threads, like multithreading, dual-path execution, and speculative threads.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123613579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040862
Y. Song, T. Pinkston
Efficient and reliable communication is essential for achieving high performance in a networked computing environment. Limited network resources bring about unavoidable competition among in-flight packets, resulting in network congestion and possibly deadlock. Many techniques have been proposed to improve performance by efficiently handling network congestion and deadlock. However, none of them provide an efficient way of accelerating the movement of packets involved in congestion onward to their destinations. In this paper, we propose a new mechanism for the detection and resolution of network congestion and deadlocks. The proposed mechanism is based on increasing the scheduling priority of packets involved in congestion and providing necessary resources for those packets to make forward progress. Simulation results show that the proposed technique outperforms previously proposed techniques by effectively dispersing network congestion.
{"title":"A new mechanism for congestion and deadlock resolution","authors":"Y. Song, T. Pinkston","doi":"10.1109/ICPP.2002.1040862","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040862","url":null,"abstract":"Efficient and reliable communication is essential for achieving high performance in a networked computing environment. Limited network resources bring about unavoidable competition among in-flight packets, resulting in network congestion and possibly deadlock. Many techniques have been proposed to improve performance by efficiently handling network congestion and deadlock. However, none of them provide an efficient way of accelerating the movement of packets involved in congestion onward to their destinations. In this paper, we propose a new mechanism for the detection and resolution of network congestion and deadlocks. The proposed mechanism is based on increasing the scheduling priority of packets involved in congestion and providing necessary resources for those packets to make forward progress. Simulation results show that the proposed technique outperforms previously proposed techniques by effectively dispersing network congestion.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129654429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040914
R. Kannan, S. Sarangi, S. Ray, S. Iyengar
Given the increasing importance of optimal sensor deployment for battlefield strategists, the converse problem of reacting to a particular deployment by an enemy is equally significant and not yet addressed in a quantifiable manner in the literature. We address this issue by modeling a two stage game in which the opponent deploys sensors to cover a sensor field and we attempt to maximally reduce his coverage at minimal cost. In this context, we introduce the concept of minimal sensor integrity which measures Me vulnerability of any sensor deployment. We find the best response by quantifying the merits of each response. While the problem of optimally deploying sensors subject to coverage constraints is NP-complete, in this paper we show that the best response (i.e. the maximum vulnerability) can be computed in polynomial time for sensors with arbitrary coverage capabilities deployed over points in any dimensional space. In the special case when sensor coverages form an interval graph (as in a linear grid), we describe a better O(Min(M/sup 2/, NM)) dynamic programming algorithm.
{"title":"Minimal sensor integrity in sensor grids","authors":"R. Kannan, S. Sarangi, S. Ray, S. Iyengar","doi":"10.1109/ICPP.2002.1040914","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040914","url":null,"abstract":"Given the increasing importance of optimal sensor deployment for battlefield strategists, the converse problem of reacting to a particular deployment by an enemy is equally significant and not yet addressed in a quantifiable manner in the literature. We address this issue by modeling a two stage game in which the opponent deploys sensors to cover a sensor field and we attempt to maximally reduce his coverage at minimal cost. In this context, we introduce the concept of minimal sensor integrity which measures Me vulnerability of any sensor deployment. We find the best response by quantifying the merits of each response. While the problem of optimally deploying sensors subject to coverage constraints is NP-complete, in this paper we show that the best response (i.e. the maximum vulnerability) can be computed in polynomial time for sensors with arbitrary coverage capabilities deployed over points in any dimensional space. In the special case when sensor coverages form an interval graph (as in a linear grid), we describe a better O(Min(M/sup 2/, NM)) dynamic programming algorithm.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128955839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Min-Te Sun, Lifei Huang, Shao-Cheng Wang, A. Arora, T. Lai
Multicast/broadcast is an important service primitive in networks. The IEEE 802.11 multicast/broadcast protocol is based on the basic access procedure of Carrier Sense Multiple Access with Collision Avoidance (CSMA/CA). This protocol does not provide any media access control (MAC) layer recovery on multicast/broadcast frames. As a result, the reliability of the multicast/broadcast service is reduced due to the increased probability of lost frames resulting from interference or collisions. In this paper, we propose a reliable Batch Mode Multicast MAC protocol, BMMM, which substentially reduces the number of contention phases, thus considerably reduces the time required for a multicast/broadcast. We then propose a Location Aware Multicast MAC protocol, LAMM, that uses station location information to further improve upon BMMM. Extensive analysis and simulation results validate the reliability and efficiency of our multicast MAC protocols.
组播/广播是网络中重要的服务原语。IEEE 802.11组播/广播协议基于CSMA/CA (Carrier Sense Multiple access with Collision Avoidance)的基本接入流程。该协议不提供任何对多播/广播帧的媒体访问控制(MAC)层恢复。因此,由于干扰或碰撞导致的丢失帧的可能性增加,因此降低了组播/广播业务的可靠性。在本文中,我们提出了一种可靠的批处理模式组播MAC协议BMMM,它大大减少了争用阶段的数量,从而大大减少了组播/广播所需的时间。然后,我们提出了一种位置感知多播MAC协议LAMM,它利用站点位置信息来进一步改进BMMM。大量的分析和仿真结果验证了多播MAC协议的可靠性和有效性。
{"title":"Reliable MAC layer multicast in IEEE 802.11 wireless networks","authors":"Min-Te Sun, Lifei Huang, Shao-Cheng Wang, A. Arora, T. Lai","doi":"10.1002/wcm.129","DOIUrl":"https://doi.org/10.1002/wcm.129","url":null,"abstract":"Multicast/broadcast is an important service primitive in networks. The IEEE 802.11 multicast/broadcast protocol is based on the basic access procedure of Carrier Sense Multiple Access with Collision Avoidance (CSMA/CA). This protocol does not provide any media access control (MAC) layer recovery on multicast/broadcast frames. As a result, the reliability of the multicast/broadcast service is reduced due to the increased probability of lost frames resulting from interference or collisions. In this paper, we propose a reliable Batch Mode Multicast MAC protocol, BMMM, which substentially reduces the number of contention phases, thus considerably reduces the time required for a multicast/broadcast. We then propose a Location Aware Multicast MAC protocol, LAMM, that uses station location information to further improve upon BMMM. Extensive analysis and simulation results validate the reliability and efficiency of our multicast MAC protocols.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130243482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040864
Nissim Harel, Hasnain A. Mandviwala, K. Knobe, U. Ramachandran
Stampede is a parallel programming system to support computationally demanding applications including interactive vision, speech and multimedia collaboration. The system alleviates concerns such as communication, synchronization, and buffer management in programming such real-time stream-oriented applications. Threads are loosely connected by channels that hold timestamped data items. There are two performance concerns when programming with Stampede. The first is space, namely, ensuring that memory is not wasted on items that are not fully processed. The second is time, namely, ensuring that processing resource is not wasted on a timestamp that is not fully processed. In this paper we introduce a single unifying framework, dead timestamp identification, that addresses both the space and time concerns simultaneously. Dead timestamps on a channel represent garbage. Dead timestamps at a thread represent computations that need not be performed. This framework has been implemented in the Stampede system. Experimental results showing the space advantage of this framework are presented. Using a color-based people tracker application, we show that the space advantage can be significant (up to 40%) compared to the previous garbage collection techniques in Stampede.
{"title":"Dead timestamp identification in Stampede","authors":"Nissim Harel, Hasnain A. Mandviwala, K. Knobe, U. Ramachandran","doi":"10.1109/ICPP.2002.1040864","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040864","url":null,"abstract":"Stampede is a parallel programming system to support computationally demanding applications including interactive vision, speech and multimedia collaboration. The system alleviates concerns such as communication, synchronization, and buffer management in programming such real-time stream-oriented applications. Threads are loosely connected by channels that hold timestamped data items. There are two performance concerns when programming with Stampede. The first is space, namely, ensuring that memory is not wasted on items that are not fully processed. The second is time, namely, ensuring that processing resource is not wasted on a timestamp that is not fully processed. In this paper we introduce a single unifying framework, dead timestamp identification, that addresses both the space and time concerns simultaneously. Dead timestamps on a channel represent garbage. Dead timestamps at a thread represent computations that need not be performed. This framework has been implemented in the Stampede system. Experimental results showing the space advantage of this framework are presented. Using a color-based people tracker application, we show that the space advantage can be significant (up to 40%) compared to the previous garbage collection techniques in Stampede.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129269876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040897
J. Sancho, A. Robles, J. Flich, P. López, J. Duato
The InfiniBand Architecture (IBA) defines a switch-based network with point-to-point links whose topology is arbitrarily established by the customer. We propose a simple and effective methodology for designing deadlock-free routing strategies that are able to route packets through minimal paths in InfiniBand networks. This methodology can meet the trade-off between network performance and the number of resources dedicated to deadlock avoidance. Evaluation results show that the resulting routing strategies significantly outperform up*/down* routing. In particular, throughput improvement ranges, on average, from 1.33 for small networks to 4.05 for large networks. Also, it is shown that just two virtual lanes and three service levels are enough to achieve more than 80% of the throughput improvement achieved by the best proposed routing strategy (the one that always provides minimal paths without limiting the number of resources).
{"title":"Effective methodology for deadlock-free minimal routing in InfiniBand networks","authors":"J. Sancho, A. Robles, J. Flich, P. López, J. Duato","doi":"10.1109/ICPP.2002.1040897","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040897","url":null,"abstract":"The InfiniBand Architecture (IBA) defines a switch-based network with point-to-point links whose topology is arbitrarily established by the customer. We propose a simple and effective methodology for designing deadlock-free routing strategies that are able to route packets through minimal paths in InfiniBand networks. This methodology can meet the trade-off between network performance and the number of resources dedicated to deadlock avoidance. Evaluation results show that the resulting routing strategies significantly outperform up*/down* routing. In particular, throughput improvement ranges, on average, from 1.33 for small networks to 4.05 for large networks. Also, it is shown that just two virtual lanes and three service levels are enough to achieve more than 80% of the throughput improvement achieved by the best proposed routing strategy (the one that always provides minimal paths without limiting the number of resources).","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115119272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040912
Xiaobo Zhou, Chengzhong Xu
A cost-effective approach to building up scalable video-on-demand (VoD) servers is to couple a number of VoD servers together in a cluster. In this article, we study a crucial video replication and placement problem in a distributed storage VoD cluster for high quality and high availability services. We formulate it as a combinatorial optimization problem with objectives of maximizing the encoding bit rate and the number of replicas of each video and balancing the workload of the servers. It is subject to the constraints of the storage capacity and the outgoing network bandwidth of the servers. Under the assumption of single fixed encoding bit rate for all videos, we give an optimal replication algorithm and a bounded-placement algorithm for videos with different popularities. To reduce the complexity of the replication algorithm, we present an efficient algorithm that utilizes the Zipf-like video popularity distributions to approximate the optimal solution. For videos with scalable encoding bit rates, we propose a heuristic algorithm based on simulated annealing. We conduct a comprehensive performance evaluation of the algorithms and demonstrate their effectiveness via simulations over a synthetic workload set.
{"title":"Optimal video replication and placement on a cluster of video-on-demand servers","authors":"Xiaobo Zhou, Chengzhong Xu","doi":"10.1109/ICPP.2002.1040912","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040912","url":null,"abstract":"A cost-effective approach to building up scalable video-on-demand (VoD) servers is to couple a number of VoD servers together in a cluster. In this article, we study a crucial video replication and placement problem in a distributed storage VoD cluster for high quality and high availability services. We formulate it as a combinatorial optimization problem with objectives of maximizing the encoding bit rate and the number of replicas of each video and balancing the workload of the servers. It is subject to the constraints of the storage capacity and the outgoing network bandwidth of the servers. Under the assumption of single fixed encoding bit rate for all videos, we give an optimal replication algorithm and a bounded-placement algorithm for videos with different popularities. To reduce the complexity of the replication algorithm, we present an efficient algorithm that utilizes the Zipf-like video popularity distributions to approximate the optimal solution. For videos with scalable encoding bit rates, we propose a heuristic algorithm based on simulated annealing. We conduct a comprehensive performance evaluation of the algorithms and demonstrate their effectiveness via simulations over a synthetic workload set.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128339973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040890
Cecilia Ekelin, Jan Jonsson
In this paper, we propose a pseudo-polynomial-time lower-bound algorithm for the problem of assigning and scheduling real-time tasks in a distributed system such that the network communication is minimized The key feature of our algorithm is translating the task assignment problem into the so called k-cut problem of a graph, which is known to be solvable in polynomial time for fixed k. Experiments show that the lower bound computed by our algorithm in fact is optimal in up to 89% of the cases and increases the speed of an overall optimization algorithm by a factor of two on average.
{"title":"A lower-bound algorithm for minimizing network communication in real-time systems","authors":"Cecilia Ekelin, Jan Jonsson","doi":"10.1109/ICPP.2002.1040890","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040890","url":null,"abstract":"In this paper, we propose a pseudo-polynomial-time lower-bound algorithm for the problem of assigning and scheduling real-time tasks in a distributed system such that the network communication is minimized The key feature of our algorithm is translating the task assignment problem into the so called k-cut problem of a graph, which is known to be solvable in polynomial time for fixed k. Experiments show that the lower bound computed by our algorithm in fact is optimal in up to 89% of the cases and increases the speed of an overall optimization algorithm by a factor of two on average.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121878396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040899
S. Chuang, A. Chan, Jiannong Cao
Describes a Web proxy architecture called WebPADS, short for "Web Proxy for actively deployable services." The RebPADS was developed to enhance Web applications running on a wireless network. The RebPADS provides mechanisms to automatically locate and configure a flexible and adaptive wireless Web proxy. In addition, it provides a framework that facilitates the development of add-on services, where the services can be actively deployed and migrated across Web proxies, in order to adapt to the changing wireless environment.
{"title":"Dynamic service composition for wireless Web access","authors":"S. Chuang, A. Chan, Jiannong Cao","doi":"10.1109/ICPP.2002.1040899","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040899","url":null,"abstract":"Describes a Web proxy architecture called WebPADS, short for \"Web Proxy for actively deployable services.\" The RebPADS was developed to enhance Web applications running on a wireless network. The RebPADS provides mechanisms to automatically locate and configure a flexible and adaptive wireless Web proxy. In addition, it provides a framework that facilitates the development of add-on services, where the services can be actively deployed and migrated across Web proxies, in order to adapt to the changing wireless environment.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132406812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040874
Chulho Won, Ben Lee, Chansu Yu, S. Moh, Yong-Youn Kim, K. Park
This paper presents Linux/SimOS, a Linux operating system port to SimOS, which is a complete machine simulator from Stanford. The motivation for Linux/SimOS is to alleviate the limitations of SimOS, which only supports proprietary operating systems. The contributions made in this paper are two-fold: First, the major modifications that were necessary to run Linux on SimOS are described. Second, a detailed analysis of the UDP/IP protocol and M-VIA is performed to demonstrate the capabilities of Linux/SimOS. The simulation study shows that Linux/SimOS is capable of capturing all aspects of communication performance, including the effects of the kernel, device drivers, and network interface.
{"title":"Linux/SimOS - a simulation environment for evaluating high-speed communication systems","authors":"Chulho Won, Ben Lee, Chansu Yu, S. Moh, Yong-Youn Kim, K. Park","doi":"10.1109/ICPP.2002.1040874","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040874","url":null,"abstract":"This paper presents Linux/SimOS, a Linux operating system port to SimOS, which is a complete machine simulator from Stanford. The motivation for Linux/SimOS is to alleviate the limitations of SimOS, which only supports proprietary operating systems. The contributions made in this paper are two-fold: First, the major modifications that were necessary to run Linux on SimOS are described. Second, a detailed analysis of the UDP/IP protocol and M-VIA is performed to demonstrate the capabilities of Linux/SimOS. The simulation study shows that Linux/SimOS is capable of capturing all aspects of communication performance, including the effects of the kernel, device drivers, and network interface.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129608217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}